Model Vision

Overview

FlowDown supports visual understanding through local MLX or cloud vision models and can auto-generate descriptions, OCR, and QR results for text-only models. Images are resized to ~1024 px and stripped of EXIF by default when Settings → Editor → Compress Image is on.

Image compression setting

Quick Start

Click the ＋ beside the composer, or drag/paste files into the input box.
On iOS/iPadOS, choose Take Photo, Photo Library, or Files; on macOS, pick any system-supported file.
Send your question/instruction and wait for the analysis result.

Thumbnails appear above the composer; tap or hover to preview, rename, or remove attachments.

Configuration Checklist

Enable Vision capability: Turn on Vision under Settings → Model (local) or on the cloud model edit page. Forcing Vision on unsupported models will fail.
Auxiliary Visual Model: Settings → Inference → Visual Assessment → Auxiliary Visual Model. Used when the active chat model lacks Vision or when you need text fallbacks (descriptions/OCR/QR).
Skip Recognition If Possible: Settings → Inference → Visual Assessment, default on. When the chat model has Vision, skipping sends the raw image; disable if you need OCR/backups or may switch to a text-only model.
Compress Image: Settings → Editor → Compress Image, on by default to shorten uploads and remove EXIF data.

Model capability configuration Visual inference configuration

How It Works

Dual Paths

Vision models: receive the image plus any generated description.
Text-only models: FlowDown calls the Auxiliary Visual Model to turn images into text before sending.

When Preprocessing Runs

Triggered when an image lacks a manual note and either:
- the chat model does not support Vision, or
- Skip Recognition If Possible is turned off.
Skipped when the chat model supports Vision and the skip toggle is on.

Processing Steps

Generate a scene description
Multilingual OCR extraction
QR code detection and decoding

Results are written to the attachment’s text representation and persisted (non-ephemeral). Images are always compressed and EXIF-stripped. If no Vision-capable Auxiliary Visual Model is configured and the chat model is text-only, the image will be skipped.

Delivery to the Conversation

Vision chat model: sends the raw image plus description.
Text-only chat model: sends text representation only; if empty, the attachment is omitted.

Prompting & Verification

Be explicit: “Summarize this whiteboard,” “Compare these two screenshots,” or “Convert the table to CSV.”
Reference filenames, e.g., “In invoice.png, what is the payment due date?”
Use the message menu → Raw Data to confirm [Image Description], [Image Optical Character Recognition Result], and [QRCode Recognition].
If results look off, ask the model to re-check attachments or add more context and resend.
Delete attachments from the message menu if you don’t want them kept.

Rendering with attachments