jtdavies's picture
Upload README.md with huggingface_hub
e6cd046 verified
---
language:
- en
license: apache-2.0
tags:
- mlx
- vision
- multimodal
base_model: janhq/Jan-v2-VL-low
---
# Jan-v2-VL-low 4-bit MLX
This is a 4-bit quantized MLX conversion of [janhq/Jan-v2-VL-low](https://huggingface.co/janhq/Jan-v2-VL-low).
## Model Description
Jan-v2-VL is an 8-billion parameter vision-language model designed for long-horizon, multi-step tasks in real software environments. This "low" variant is optimized for faster inference while maintaining strong performance on agentic automation and UI control tasks.
**Key Features:**
- Vision-language understanding for browser and desktop applications
- Screenshot grounding and tool call capabilities
- Stable multi-step execution with minimal performance drift
- Error recovery and intermediate state maintenance
## Quantization
This model was converted to MLX format with 4-bit quantization using [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) by Prince Canuma.
**Conversion command:**
```bash
mlx_vlm.convert --hf-path janhq/Jan-v2-VL-low --quantize --q-bits 4 --mlx-path Jan-v2-VL-low-4bit-mlx
```
## Usage
### Installation
```bash
pip install mlx-vlm
```
### Python
```python
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model_path = "mlx-community/Jan-v2-VL-low-4bit-mlx"
model, processor = load(model_path)
config = load_config(model_path)
# Prepare input
image = ["path/to/image.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
# Generate output
output = generate(model, processor, formatted_prompt, image, verbose=False)
print(output)
```
### Command Line
```bash
mlx_vlm.generate --model mlx-community/Jan-v2-VL-low-4bit-mlx --max-tokens 100 --prompt "Describe this image" --image path/to/image.jpg
```
## Intended Use
This model is designed for:
- Agentic automation and UI control
- Stepwise operation in browsers and desktop applications
- Screenshot grounding and tool calls
- Long-horizon multi-step task execution
## License
This model is released under the Apache 2.0 license.
## Original Model
For more information, please refer to the original model: [janhq/Jan-v2-VL-low](https://huggingface.co/janhq/Jan-v2-VL-low)
## Acknowledgments
- Original model by [Jan](https://huggingface.co/janhq)
- [MLX](https://github.com/ml-explore/mlx) framework by Apple
- MLX conversion framework by [Prince Canuma](https://github.com/Blaizzy/mlx-vlm)
- Model conversion by [Incept5](https://incept5.ai)