Vision
OCR
Extract text and structured fields from images and PDFs.
POST/v1/vision/ocr
Run fine-tuned visual OCR against any image or multi-page PDF. Returns the raw text plus per-word confidence and bounding boxes.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
url | string | conditional | URL of the image or PDF to process. One of url or file_store_key is required. |
file_store_key | string | conditional | Key of a file already uploaded to Marob AI's file store. |
prompt | string | string[] | object | no | Instruction for what to extract. Defaults to "Describe the image in detail." |
fine_grained | boolean | no | Enable higher-accuracy word bounding boxes. |
page_range | [number, number] | no | For PDFs: [startPage, endPage]. Max 10 pages per request. |
Request
curl https://api.marob.ai/v1/vision/ocr \
-H "Authorization: Bearer $MAROB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://marob.ai/samples/receipt.jpg",
"prompt": ["total_price", "tax", "merchant_name"]
}'Response
{
"success": true,
"_usage": {
"input_tokens": 42,
"output_tokens": 318,
"inference_time_tokens": 1421,
"total_tokens": 1781
},
"log_id": "log_01JABCXYZ...",
"context": "A receipt from Blue Bottle Coffee dated April 18, 2026.",
"width": 720,
"height": 960,
"tags": ["receipt", "coffee", "invoice"],
"has_text": true,
"sections": [
{
"text": "Blue Bottle Coffee\n123 Valencia St\nSF, CA 94103",
"lines": [
{
"text": "Blue Bottle Coffee",
"average_confidence": 0.98,
"bounds": {
"top_left": { "x": 48, "y": 32 },
"top_right": { "x": 672, "y": 32 },
"bottom_left": { "x": 48, "y": 80 },
"bottom_right": { "x": 672, "y": 80 },
"width": 624,
"height": 48
},
"words": [
{
"text": "Blue",
"bounds": { "top_left": { "x": 48, "y": 32 } },
"confidence": 0.99
}
]
}
]
}
],
"total_pages": 1,
"page_range": [1, 1]
}