Qwen3-VL-2B-Instruct is a 2-billion-parameter vision-language model from Alibaba Cloud that jointly processes images and text for visual question answering, captioning, and document understanding. Its 2B scale positions it as one of the smaller instruction-tuned VLMs capable of zero-shot visual reasoning. Apache 2.0 licensed.
186,904,434 ↓ · 386 ♡
Qwen2.5-VL-7B-Instruct is Alibaba Cloud's 7-billion-parameter vision-language model from the Qwen2.5-VL series, accepting image and video inputs alongside text for visual question answering, document understanding, and grounding tasks. It supports multiple image resolutions dynamically and shows improved OCR and document reasoning compared to the earlier Qwen-VL series. Apache 2.0 licensed.
8,919,144 ↓ · 1,518 ♡
Gemma 4-31B-IT is Google DeepMind's 31-billion-parameter instruction-tuned vision-language model from the Gemma 4 family, supporting both image and text inputs. It offers strong multimodal reasoning at open-weight scale, with Apache 2.0 licensing making it directly deployable for commercial applications. Part of the gemma4 architecture with improvements over Gemma 2.
8,206,643 ↓ · 2,526 ♡
Qwen3.5-9B is a 9-billion-parameter instruction-tuned vision-language model from Alibaba Cloud's Qwen3.5 series, fine-tuned from Qwen3.5-9B-Base for multimodal conversational tasks. It accepts image and text inputs for visual reasoning, document understanding, and grounded question answering. Apache 2.0 licensed.
7,745,704 ↓ · 1,388 ♡
Gemma 4-26B-A4B-IT is Google DeepMind's 26-billion-total-parameter MoE (Mixture-of-Experts) vision-language model, with approximately 4 billion active parameters per token. The MoE design means it achieves 26B parameter quality while activating only ~4B per forward pass, reducing per-token compute relative to a dense 26B model. Apache 2.0 licensed.
6,532,915 ↓ · 880 ♡
Qwen3.5-4B is Alibaba Cloud's 4-billion-parameter instruction-tuned vision-language model from the Qwen3.5 series, fine-tuned from Qwen3.5-4B-Base for multimodal conversational tasks. It handles image and text inputs at a scale deployable on consumer GPUs with 8-12GB VRAM. Apache 2.0 licensed.
5,066,785 ↓ · 518 ♡
Qwen3-VL-8B-Instruct is Alibaba Cloud's 8-billion-parameter vision-language model from the Qwen3-VL series, extending the VL line with improved visual reasoning and document understanding. It targets mid-tier server GPU deployment where 2B VLMs are insufficient and 30B+ is impractical. Apache 2.0 licensed.
4,887,168 ↓ · 891 ♡
Qwen2-VL-2B-Instruct is a 2B parameter vision-language model from Alibaba's Qwen team, supporting image and video understanding alongside text instruction-following. At 2B parameters it runs on consumer GPUs while retaining competitive OCR, chart reading, and visual QA accuracy. It is the instruction-tuned version of the Qwen2-VL-2B base.
4,002,670 ↓ · 500 ♡
Qwen2.5-VL-3B-Instruct is Alibaba's 3B parameter vision-language model from the Qwen2.5-VL series, supporting image and video frame understanding alongside text instruction-following. It targets edge and mobile deployment where 7B+ VL models are too memory-intensive, while maintaining reasonable accuracy on OCR, chart reading, and visual QA. Instruction-tuned for conversational use.
3,573,815 ↓ · 641 ♡
Qwen3.5-35B-A3B is a 35B total parameter mixture-of-experts multimodal model from Alibaba, with approximately 3B active parameters per token during inference. It combines vision and language understanding for image captioning, visual QA, and document analysis tasks at lower compute cost than a dense 35B model. Apache 2.0 licensed.
3,547,396 ↓ · 1,418 ♡
gemma-4-26B-A4B-it-GGUF is Unsloth's GGUF quantization of Google's Gemma 4 26B mixture-of-experts instruction-tuned multimodal model. With approximately 4B active parameters per token, it runs on 16–24GB VRAM in GGUF format while retaining vision and text understanding capabilities. GGUF format provides llama.cpp and Ollama compatibility for local self-hosted deployment.
3,328,573 ↓ · 670 ♡
Qwen3.5-27B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
3,279,581 ↓ · 967 ♡
Qwen3.5-0.8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
3,038,119 ↓ · 523 ♡
Qwen2-VL-7B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
3,036,634 ↓ · 1,274 ♡
llava-1.5-7b-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
3,011,106 ↓ · 359 ♡
Qwen3.6-35B-A3B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,884,820 ↓ · 1,631 ♡
gemma-3-12b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,755,246 ↓ · 713 ♡
moondream2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,711,246 ↓ · 1,408 ♡
Qwen3.6-35B-A3B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,621,217 ↓ · 203 ♡
DeepSeek-OCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,435,460 ↓ · 3,222 ♡
Qwen3-VL-4B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,395,913 ↓ · 379 ♡
gemma-3-4b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,253,151 ↓ · 1,319 ♡
Qwen3.6-35B-A3B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,243,715 ↓ · 931 ♡
Kimi-K2.5 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,156,569 ↓ · 2,775 ♡
Qwen3.6-27B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
2,095,092 ↓ · 188 ♡
Qwen3.5-2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,740,920 ↓ · 267 ♡
gemma-4-31B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,739,981 ↓ · 395 ♡
gemma-4-26B-A4B-it-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,739,145 ↓ · 57 ♡
Qwen2-VL-7B-Instruct-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,711,084 ↓ · 49 ♡
Phi-3.5-vision-instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,653,822 ↓ · 733 ♡
gemma-4-E4B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,636,730 ↓ · 361 ♡
Qwen3.5-35B-A3B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,598,686 ↓ · 147 ♡
DeepSeek-OCR-2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,524,016 ↓ · 943 ♡
Qwen3.5-27B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,499,486 ↓ · 133 ♡
MinerU2.5-2509-1.2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,488,014 ↓ · 356 ♡
Qwen3.6-27B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,458,973 ↓ · 1,132 ♡
Qwen3-VL-235B-A22B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,427,033 ↓ · 383 ♡
gemma-4-31B-it-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,360,875 ↓ · 38 ♡
Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,325,045 ↓ · 1,375 ♡
Qwen3.5-397B-A17B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,286,956 ↓ · 165 ♡
Florence-2-base is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,237,583 ↓ · 367 ♡
Qwen3-VL-32B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,214,787 ↓ · 198 ♡
Qwen3-VL-4B-Thinking is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,201,775 ↓ · 109 ♡
InternVL2-2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,156,895 ↓ · 80 ♡
Qwen3.6-27B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,147,196 ↓ · 583 ♡
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,125,249 ↓ · 526 ♡
Qwen3.5-9B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,070,272 ↓ · 581 ♡
Llama-3.1-Nemotron-Nano-VL-8B-V1 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,046,701 ↓ · 177 ♡
gemma-4-E2B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,011,424 ↓ · 174 ♡
Qwen3.5-35B-A3B-GPTQ-Int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
1,005,648 ↓ · 81 ♡
Qwen3.5-122B-A10B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
940,306 ↓ · 534 ♡
Qwen3.5-35B-A3B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
934,614 ↓ · 833 ♡
Qwen3-VL-32B-Instruct-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
898,509 ↓ · 45 ♡
Kimi-K2.6 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
892,962 ↓ · 1,205 ♡
Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
858,528 ↓ · 559 ♡
LightOnOCR-2-1B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
767,272 ↓ · 678 ♡
Qwen3-VL-30B-A3B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
761,927 ↓ · 567 ♡
Qwen3-VL-8B-Thinking is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
758,428 ↓ · 206 ♡
Qwen3.6-27B-AWQ-INT4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
709,394 ↓ · 46 ♡
SmolVLM-256M-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
707,967 ↓ · 357 ♡
Florence-2-large is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
705,766 ↓ · 1,804 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
675,299 ↓ · 649 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
670,698 ↓ · 120 ♡
deepseek-vl2-tiny is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
654,869 ↓ · 248 ♡
InternVL2-1B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
630,988 ↓ · 81 ♡
Qwen3.5-4B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
625,080 ↓ · 234 ♡
llava-v1.6-mistral-7b-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
609,728 ↓ · 308 ♡
Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
602,135 ↓ · 10 ♡
Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
599,840 ↓ · 9 ♡
Qwen3.5-122B-A10B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
592,697 ↓ · 94 ♡
gemma-3-27b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
592,685 ↓ · 1,964 ♡
gemma-3-27b-it-AWQ-INT4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
575,630 ↓ · 7 ♡
Molmo2-8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
574,029 ↓ · 177 ♡
Qwen3.5-397B-A17B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
565,005 ↓ · 1,474 ♡
chandra-ocr-2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
559,497 ↓ · 323 ♡
Qwen3.5-35B-A3B-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
554,115 ↓ · 39 ♡
Qwen3-VL-8B-Instruct-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
554,080 ↓ · 68 ♡
NVIDIA-Nemotron-Parse-v1.1 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
550,541 ↓ · 168 ♡
gemma-4-31B-it-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
545,396 ↓ · 10 ♡
blip2-opt-2.7b is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
543,055 ↓ · 441 ♡
llava-onevision-qwen2-0.5b-ov-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
541,791 ↓ · 55 ♡
tiny-Qwen2_5_VLForConditionalGeneration is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
540,000 ↓ · 0 ♡
NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
511,318 ↓ · 50 ♡
SmolVLM2-500M-Video-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
492,715 ↓ · 136 ♡
gemma-4-31B-it-unsloth-bnb-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
486,662 ↓ · 13 ♡
gemma-4-31B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
482,351 ↓ · 354 ♡
InternVL2-8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
482,077 ↓ · 187 ♡
gemma-3-4b-it-qat-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
472,322 ↓ · 8 ♡
Qwen3.5-27B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
446,854 ↓ · 489 ♡
Nanonets-OCR2-3B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
445,828 ↓ · 502 ♡
gemma-3-27b-it-abliterated is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
427,949 ↓ · 317 ♡
Qwen3.5-27B-GPTQ-Int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
419,519 ↓ · 52 ♡
Qianfan-OCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
414,371 ↓ · 1,168 ♡
Llama-4-Scout-17B-16E-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
389,744 ↓ · 1,280 ♡
Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
380,428 ↓ · 60 ♡
GLM-4.1V-9B-Thinking is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
378,407 ↓ · 777 ♡
Qwen3.6-27B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
375,175 ↓ · 284 ♡
Qwen3.5-35B-A3B-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
372,624 ↓ · 18 ♡
gemma-3n-E2B-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
365,697 ↓ · 297 ♡
Qwen3-VL-30B-A3B-Instruct-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
362,668 ↓ · 107 ♡
Qwopus3.5-9B-v3 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
361,102 ↓ · 87 ♡
Gemma-4-E2B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
350,219 ↓ · 157 ♡
gemma-4-31B-it-MLX-8bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
335,369 ↓ · 2 ♡
Idefics3-8B-Llama3 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
334,519 ↓ · 304 ♡
Qwen3.5-0.8B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
333,511 ↓ · 155 ♡
Step3-VL-10B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
332,673 ↓ · 406 ♡
Qwen3.5-27B-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
330,496 ↓ · 43 ♡
Qwen3.6-35B-A3B-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
326,901 ↓ · 45 ♡
Qwen3.5-9B-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
326,587 ↓ · 27 ♡
Qwen2.5-VL-7B-Instruct-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
320,142 ↓ · 104 ♡
EXAONE-4.5-33B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
319,798 ↓ · 152 ♡
RolmOCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
313,230 ↓ · 586 ♡
InternVL2_5-8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
313,042 ↓ · 104 ♡
gemma-3n-E4B-it-MLX-bf16 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
308,673 ↓ · 3 ♡
gemma-3n-E4B-it-MLX-8bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
307,139 ↓ · 0 ♡
google_gemma-4-26B-A4B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
304,592 ↓ · 113 ♡
gemma-3n-E4B-it-MLX-6bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
303,662 ↓ · 0 ♡
medgemma-4b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
297,064 ↓ · 953 ♡
Qwen3.5-2B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
296,345 ↓ · 100 ♡
Qwen3.5-122B-A10B-GPTQ-Int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
295,370 ↓ · 37 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
292,155 ↓ · 601 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
290,793 ↓ · 2,814 ♡
gemma-4-31B-it-MLX-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
289,756 ↓ · 1 ♡
gemma-3n-E4B-it-MLX-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
289,004 ↓ · 2 ♡
Qwen3.5-9B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
287,785 ↓ · 10 ♡
gemma-3-27b-it-GPTQ-4b-128g is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
286,669 ↓ · 44 ♡
google_gemma-4-31B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
285,205 ↓ · 62 ♡
Qwen3.6-35B-A3B-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
284,802 ↓ · 17 ♡
olmOCR-2-7B-1025-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
284,331 ↓ · 230 ♡