AI Tools.

Search

image text to text models

129 models · ranked by HuggingFace downloads

Qwen3-VL-2B-Instruct

Qwen3-VL-2B-Instruct is a 2-billion-parameter vision-language model from Alibaba Cloud that jointly processes images and text for visual question answering, captioning, and document understanding. Its 2B scale positions it as one of the smaller instruction-tuned VLMs capable of zero-shot visual reasoning. Apache 2.0 licensed.

186,904,434 ↓ · 386 ♡

Qwen2.5-VL-7B-Instruct

Qwen2.5-VL-7B-Instruct is Alibaba Cloud's 7-billion-parameter vision-language model from the Qwen2.5-VL series, accepting image and video inputs alongside text for visual question answering, document understanding, and grounding tasks. It supports multiple image resolutions dynamically and shows improved OCR and document reasoning compared to the earlier Qwen-VL series. Apache 2.0 licensed.

8,919,144 ↓ · 1,518 ♡

gemma-4-31B-it

Gemma 4-31B-IT is Google DeepMind's 31-billion-parameter instruction-tuned vision-language model from the Gemma 4 family, supporting both image and text inputs. It offers strong multimodal reasoning at open-weight scale, with Apache 2.0 licensing making it directly deployable for commercial applications. Part of the gemma4 architecture with improvements over Gemma 2.

8,206,643 ↓ · 2,526 ♡

Qwen3.5-9B

Qwen3.5-9B is a 9-billion-parameter instruction-tuned vision-language model from Alibaba Cloud's Qwen3.5 series, fine-tuned from Qwen3.5-9B-Base for multimodal conversational tasks. It accepts image and text inputs for visual reasoning, document understanding, and grounded question answering. Apache 2.0 licensed.

7,745,704 ↓ · 1,388 ♡

gemma-4-26B-A4B-it

Gemma 4-26B-A4B-IT is Google DeepMind's 26-billion-total-parameter MoE (Mixture-of-Experts) vision-language model, with approximately 4 billion active parameters per token. The MoE design means it achieves 26B parameter quality while activating only ~4B per forward pass, reducing per-token compute relative to a dense 26B model. Apache 2.0 licensed.

6,532,915 ↓ · 880 ♡

Qwen3.5-4B

Qwen3.5-4B is Alibaba Cloud's 4-billion-parameter instruction-tuned vision-language model from the Qwen3.5 series, fine-tuned from Qwen3.5-4B-Base for multimodal conversational tasks. It handles image and text inputs at a scale deployable on consumer GPUs with 8-12GB VRAM. Apache 2.0 licensed.

5,066,785 ↓ · 518 ♡

Qwen3-VL-8B-Instruct

Qwen3-VL-8B-Instruct is Alibaba Cloud's 8-billion-parameter vision-language model from the Qwen3-VL series, extending the VL line with improved visual reasoning and document understanding. It targets mid-tier server GPU deployment where 2B VLMs are insufficient and 30B+ is impractical. Apache 2.0 licensed.

4,887,168 ↓ · 891 ♡

Qwen2-VL-2B-Instruct

Qwen2-VL-2B-Instruct is a 2B parameter vision-language model from Alibaba's Qwen team, supporting image and video understanding alongside text instruction-following. At 2B parameters it runs on consumer GPUs while retaining competitive OCR, chart reading, and visual QA accuracy. It is the instruction-tuned version of the Qwen2-VL-2B base.

4,002,670 ↓ · 500 ♡

Qwen2.5-VL-3B-Instruct

Qwen2.5-VL-3B-Instruct is Alibaba's 3B parameter vision-language model from the Qwen2.5-VL series, supporting image and video frame understanding alongside text instruction-following. It targets edge and mobile deployment where 7B+ VL models are too memory-intensive, while maintaining reasonable accuracy on OCR, chart reading, and visual QA. Instruction-tuned for conversational use.

3,573,815 ↓ · 641 ♡

Qwen3.5-35B-A3B

Qwen3.5-35B-A3B is a 35B total parameter mixture-of-experts multimodal model from Alibaba, with approximately 3B active parameters per token during inference. It combines vision and language understanding for image captioning, visual QA, and document analysis tasks at lower compute cost than a dense 35B model. Apache 2.0 licensed.

3,547,396 ↓ · 1,418 ♡

gemma-4-26B-A4B-it-GGUF

gemma-4-26B-A4B-it-GGUF is Unsloth's GGUF quantization of Google's Gemma 4 26B mixture-of-experts instruction-tuned multimodal model. With approximately 4B active parameters per token, it runs on 16–24GB VRAM in GGUF format while retaining vision and text understanding capabilities. GGUF format provides llama.cpp and Ollama compatibility for local self-hosted deployment.

3,328,573 ↓ · 670 ♡

Qwen3.5-27B

Qwen3.5-27B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

3,279,581 ↓ · 967 ♡

Qwen3.5-0.8B

Qwen3.5-0.8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

3,038,119 ↓ · 523 ♡

Qwen2-VL-7B-Instruct

Qwen2-VL-7B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

3,036,634 ↓ · 1,274 ♡

llava-1.5-7b-hf

llava-1.5-7b-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

3,011,106 ↓ · 359 ♡

Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,884,820 ↓ · 1,631 ♡

gemma-3-12b-it

gemma-3-12b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,755,246 ↓ · 713 ♡

moondream2

moondream2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,711,246 ↓ · 1,408 ♡

Qwen3.6-35B-A3B-FP8

Qwen3.6-35B-A3B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,621,217 ↓ · 203 ♡

DeepSeek-OCR

DeepSeek-OCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,435,460 ↓ · 3,222 ♡

Qwen3-VL-4B-Instruct

Qwen3-VL-4B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,395,913 ↓ · 379 ♡

gemma-3-4b-it

gemma-3-4b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,253,151 ↓ · 1,319 ♡

Qwen3.6-35B-A3B-GGUF

Qwen3.6-35B-A3B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,243,715 ↓ · 931 ♡

Kimi-K2.5

Kimi-K2.5 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,156,569 ↓ · 2,775 ♡

Qwen3.6-27B-FP8

Qwen3.6-27B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

2,095,092 ↓ · 188 ♡

Qwen3.5-2B

Qwen3.5-2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,740,920 ↓ · 267 ♡

gemma-4-31B-it-GGUF

gemma-4-31B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,739,981 ↓ · 395 ♡

gemma-4-26B-A4B-it-AWQ-4bit

gemma-4-26B-A4B-it-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,739,145 ↓ · 57 ♡

Qwen2-VL-7B-Instruct-AWQ

Qwen2-VL-7B-Instruct-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,711,084 ↓ · 49 ♡

Phi-3.5-vision-instruct

Phi-3.5-vision-instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,653,822 ↓ · 733 ♡

gemma-4-E4B-it-GGUF

gemma-4-E4B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,636,730 ↓ · 361 ♡

Qwen3.5-35B-A3B-FP8

Qwen3.5-35B-A3B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,598,686 ↓ · 147 ♡

DeepSeek-OCR-2

DeepSeek-OCR-2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,524,016 ↓ · 943 ♡

Qwen3.5-27B-FP8

Qwen3.5-27B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,499,486 ↓ · 133 ♡

MinerU2.5-2509-1.2B

MinerU2.5-2509-1.2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,488,014 ↓ · 356 ♡

Qwen3.6-27B

Qwen3.6-27B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,458,973 ↓ · 1,132 ♡

Qwen3-VL-235B-A22B-Instruct

Qwen3-VL-235B-A22B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,427,033 ↓ · 383 ♡

gemma-4-31B-it-AWQ-4bit

gemma-4-31B-it-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,360,875 ↓ · 38 ♡

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,325,045 ↓ · 1,375 ♡

Qwen3.5-397B-A17B-FP8

Qwen3.5-397B-A17B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,286,956 ↓ · 165 ♡

Florence-2-base

Florence-2-base is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,237,583 ↓ · 367 ♡

Qwen3-VL-32B-Instruct

Qwen3-VL-32B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,214,787 ↓ · 198 ♡

Qwen3-VL-4B-Thinking

Qwen3-VL-4B-Thinking is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,201,775 ↓ · 109 ♡

InternVL2-2B

InternVL2-2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,156,895 ↓ · 80 ♡

Qwen3.6-27B-GGUF

Qwen3.6-27B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,147,196 ↓ · 583 ♡

Gemma-4-E4B-Uncensored-HauhauCS-Aggressive

Gemma-4-E4B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,125,249 ↓ · 526 ♡

Qwen3.5-9B-GGUF

Qwen3.5-9B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,070,272 ↓ · 581 ♡

Llama-3.1-Nemotron-Nano-VL-8B-V1

Llama-3.1-Nemotron-Nano-VL-8B-V1 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,046,701 ↓ · 177 ♡

gemma-4-E2B-it-GGUF

gemma-4-E2B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,011,424 ↓ · 174 ♡

Qwen3.5-35B-A3B-GPTQ-Int4

Qwen3.5-35B-A3B-GPTQ-Int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

1,005,648 ↓ · 81 ♡

Qwen3.5-122B-A10B

Qwen3.5-122B-A10B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

940,306 ↓ · 534 ♡

Qwen3.5-35B-A3B-GGUF

Qwen3.5-35B-A3B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

934,614 ↓ · 833 ♡

Qwen3-VL-32B-Instruct-FP8

Qwen3-VL-32B-Instruct-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

898,509 ↓ · 45 ♡

Kimi-K2.6

Kimi-K2.6 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

892,962 ↓ · 1,205 ♡

Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

858,528 ↓ · 559 ♡

LightOnOCR-2-1B

LightOnOCR-2-1B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

767,272 ↓ · 678 ♡

Qwen3-VL-30B-A3B-Instruct

Qwen3-VL-30B-A3B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

761,927 ↓ · 567 ♡

Qwen3-VL-8B-Thinking

Qwen3-VL-8B-Thinking is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

758,428 ↓ · 206 ♡

Qwen3.6-27B-AWQ-INT4

Qwen3.6-27B-AWQ-INT4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

709,394 ↓ · 46 ♡

SmolVLM-256M-Instruct

SmolVLM-256M-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

707,967 ↓ · 357 ♡

Florence-2-large

Florence-2-large is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

705,766 ↓ · 1,804 ♡

deepseek-vl2-tiny

deepseek-vl2-tiny is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

654,869 ↓ · 248 ♡

InternVL2-1B

InternVL2-1B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

630,988 ↓ · 81 ♡

Qwen3.5-4B-GGUF

Qwen3.5-4B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

625,080 ↓ · 234 ♡

llava-v1.6-mistral-7b-hf

llava-v1.6-mistral-7b-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

609,728 ↓ · 308 ♡

Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit

Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

602,135 ↓ · 10 ♡

Qwen3.5-122B-A10B-FP8

Qwen3.5-122B-A10B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

592,697 ↓ · 94 ♡

gemma-3-27b-it

gemma-3-27b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

592,685 ↓ · 1,964 ♡

gemma-3-27b-it-AWQ-INT4

gemma-3-27b-it-AWQ-INT4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

575,630 ↓ · 7 ♡

Molmo2-8B

Molmo2-8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

574,029 ↓ · 177 ♡

Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

565,005 ↓ · 1,474 ♡

chandra-ocr-2

chandra-ocr-2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

559,497 ↓ · 323 ♡

Qwen3.5-35B-A3B-AWQ-4bit

Qwen3.5-35B-A3B-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

554,115 ↓ · 39 ♡

Qwen3-VL-8B-Instruct-FP8

Qwen3-VL-8B-Instruct-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

554,080 ↓ · 68 ♡

NVIDIA-Nemotron-Parse-v1.1

NVIDIA-Nemotron-Parse-v1.1 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

550,541 ↓ · 168 ♡

gemma-4-31B-it-AWQ

gemma-4-31B-it-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

545,396 ↓ · 10 ♡

blip2-opt-2.7b

blip2-opt-2.7b is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

543,055 ↓ · 441 ♡

llava-onevision-qwen2-0.5b-ov-hf

llava-onevision-qwen2-0.5b-ov-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

541,791 ↓ · 55 ♡

tiny-Qwen2_5_VLForConditionalGeneration

tiny-Qwen2_5_VLForConditionalGeneration is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

540,000 ↓ · 0 ♡

NVIDIA-Nemotron-Nano-12B-v2-VL-FP8

NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

511,318 ↓ · 50 ♡

SmolVLM2-500M-Video-Instruct

SmolVLM2-500M-Video-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

492,715 ↓ · 136 ♡

gemma-4-31B-it-unsloth-bnb-4bit

gemma-4-31B-it-unsloth-bnb-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

486,662 ↓ · 13 ♡

gemma-4-31B

gemma-4-31B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

482,351 ↓ · 354 ♡

InternVL2-8B

InternVL2-8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

482,077 ↓ · 187 ♡

gemma-3-4b-it-qat-4bit

gemma-3-4b-it-qat-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

472,322 ↓ · 8 ♡

Qwen3.5-27B-GGUF

Qwen3.5-27B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

446,854 ↓ · 489 ♡

Nanonets-OCR2-3B

Nanonets-OCR2-3B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

445,828 ↓ · 502 ♡

gemma-3-27b-it-abliterated

gemma-3-27b-it-abliterated is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

427,949 ↓ · 317 ♡

Qwen3.5-27B-GPTQ-Int4

Qwen3.5-27B-GPTQ-Int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

419,519 ↓ · 52 ♡

Qianfan-OCR

Qianfan-OCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

414,371 ↓ · 1,168 ♡

Llama-4-Scout-17B-16E-Instruct

Llama-4-Scout-17B-16E-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

389,744 ↓ · 1,280 ♡

Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled

Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

380,428 ↓ · 60 ♡

GLM-4.1V-9B-Thinking

GLM-4.1V-9B-Thinking is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

378,407 ↓ · 777 ♡

Qwen3.6-27B-Uncensored-HauhauCS-Aggressive

Qwen3.6-27B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

375,175 ↓ · 284 ♡

Qwen3.5-35B-A3B-AWQ

Qwen3.5-35B-A3B-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

372,624 ↓ · 18 ♡

gemma-3n-E2B-it

gemma-3n-E2B-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

365,697 ↓ · 297 ♡

Qwen3-VL-30B-A3B-Instruct-FP8

Qwen3-VL-30B-A3B-Instruct-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

362,668 ↓ · 107 ♡

Qwopus3.5-9B-v3

Qwopus3.5-9B-v3 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

361,102 ↓ · 87 ♡

Gemma-4-E2B-Uncensored-HauhauCS-Aggressive

Gemma-4-E2B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

350,219 ↓ · 157 ♡

gemma-4-31B-it-MLX-8bit

gemma-4-31B-it-MLX-8bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

335,369 ↓ · 2 ♡

Idefics3-8B-Llama3

Idefics3-8B-Llama3 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

334,519 ↓ · 304 ♡

Qwen3.5-0.8B-GGUF

Qwen3.5-0.8B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

333,511 ↓ · 155 ♡

Step3-VL-10B

Step3-VL-10B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

332,673 ↓ · 406 ♡

Qwen3.5-27B-AWQ

Qwen3.5-27B-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

330,496 ↓ · 43 ♡

Qwen3.6-35B-A3B-AWQ-4bit

Qwen3.6-35B-A3B-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

326,901 ↓ · 45 ♡

Qwen3.5-9B-AWQ-4bit

Qwen3.5-9B-AWQ-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

326,587 ↓ · 27 ♡

Qwen2.5-VL-7B-Instruct-AWQ

Qwen2.5-VL-7B-Instruct-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

320,142 ↓ · 104 ♡

EXAONE-4.5-33B

EXAONE-4.5-33B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

319,798 ↓ · 152 ♡

RolmOCR

RolmOCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

313,230 ↓ · 586 ♡

InternVL2_5-8B

InternVL2_5-8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

313,042 ↓ · 104 ♡

gemma-3n-E4B-it-MLX-bf16

gemma-3n-E4B-it-MLX-bf16 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

308,673 ↓ · 3 ♡

gemma-3n-E4B-it-MLX-8bit

gemma-3n-E4B-it-MLX-8bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

307,139 ↓ · 0 ♡

google_gemma-4-26B-A4B-it-GGUF

google_gemma-4-26B-A4B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

304,592 ↓ · 113 ♡

gemma-3n-E4B-it-MLX-6bit

gemma-3n-E4B-it-MLX-6bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

303,662 ↓ · 0 ♡

medgemma-4b-it

medgemma-4b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

297,064 ↓ · 953 ♡

Qwen3.5-2B-GGUF

Qwen3.5-2B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

296,345 ↓ · 100 ♡

Qwen3.5-122B-A10B-GPTQ-Int4

Qwen3.5-122B-A10B-GPTQ-Int4 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

295,370 ↓ · 37 ♡

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

290,793 ↓ · 2,814 ♡

gemma-4-31B-it-MLX-4bit

gemma-4-31B-it-MLX-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

289,756 ↓ · 1 ♡

gemma-3n-E4B-it-MLX-4bit

gemma-3n-E4B-it-MLX-4bit is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

289,004 ↓ · 2 ♡

Qwen3.5-9B-FP8

Qwen3.5-9B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

287,785 ↓ · 10 ♡

gemma-3-27b-it-GPTQ-4b-128g

gemma-3-27b-it-GPTQ-4b-128g is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

286,669 ↓ · 44 ♡

google_gemma-4-31B-it-GGUF

google_gemma-4-31B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

285,205 ↓ · 62 ♡

Qwen3.6-35B-A3B-AWQ

Qwen3.6-35B-A3B-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

284,802 ↓ · 17 ♡

olmOCR-2-7B-1025-FP8

olmOCR-2-7B-1025-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.

284,331 ↓ · 230 ♡