Vision OCR API, REST: TextRecognition.Recognize

Written by

Updated at November 26, 2024

HTTP request
Body parameters
Response
TextAnnotation
Block
Polygon
Vertex
Line
Word
TextSegments
DetectedLanguage
Entity
Table
TableCell

To send the image for text recognition.

HTTP request

POST https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText

Body parameters

{
  // Includes only one of the fields `content`
  "content": "string",
  // end of the list of possible fields
  "mimeType": "string",
  "languageCodes": [
    "string"
  ],
  "model": "string"
}

Field	Description
content	string (bytes) Bytes with data Includes only one of the fields `content`.
mimeType	string Specifications of the (MIME type). Each specification contains the file to analyze and features to use for analysis. Restrictions: Supported file formats: `JPEG`, `PNG`, `PDF`. Maximum file size: see documentation. Image size should not exceed 20M pixels (length x width). The number of pages in a PDF file should not exceed 1.
languageCodes[]	string List of the languages to recognize text. Specified in ISO 639-1 format (for example, `ru`).
model	string Model to use for text detection.

Response

HTTP Code: 200 - OK

{
  "textAnnotation": {
    "width": "string",
    "height": "string",
    "blocks": [
      {
        "boundingBox": {
          "vertices": [
            {
              "x": "string",
              "y": "string"
            }
          ]
        },
        "lines": [
          {
            "boundingBox": {
              "vertices": [
                {
                  "x": "string",
                  "y": "string"
                }
              ]
            },
            "text": "string",
            "words": [
              {
                "boundingBox": {
                  "vertices": [
                    {
                      "x": "string",
                      "y": "string"
                    }
                  ]
                },
                "text": "string",
                "entityIndex": "string",
                "textSegments": [
                  {
                    "startIndex": "string",
                    "length": "string"
                  }
                ]
              }
            ],
            "textSegments": [
              {
                "startIndex": "string",
                "length": "string"
              }
            ],
            "orientation": "string"
          }
        ],
        "languages": [
          {
            "languageCode": "string"
          }
        ],
        "textSegments": [
          {
            "startIndex": "string",
            "length": "string"
          }
        ]
      }
    ],
    "entities": [
      {
        "name": "string",
        "text": "string"
      }
    ],
    "tables": [
      {
        "boundingBox": {
          "vertices": [
            {
              "x": "string",
              "y": "string"
            }
          ]
        },
        "rowCount": "string",
        "columnCount": "string",
        "cells": [
          {
            "boundingBox": {
              "vertices": [
                {
                  "x": "string",
                  "y": "string"
                }
              ]
            },
            "rowIndex": "string",
            "columnIndex": "string",
            "columnSpan": "string",
            "rowSpan": "string",
            "text": "string",
            "textSegments": [
              {
                "startIndex": "string",
                "length": "string"
              }
            ]
          }
        ]
      }
    ],
    "fullText": "string",
    "rotate": "string"
  },
  "page": "string"
}

Field

Description

textAnnotation

TextAnnotation

Recognized text blocks in page or text from entities.

page

string (int64)

Page number in PDF file.

TextAnnotation

Field	Description
width	string (int64) Page width in pixels.
height	string (int64) Page height in pixels.
blocks[]	Block Recognized text blocks in this page.
entities[]	Entity Recognized entities.
tables[]	Table
fullText	string Full text recognized from image.
rotate	enum (Angle) Angle of image rotation. `ANGLE_UNSPECIFIED` `ANGLE_0` `ANGLE_90` `ANGLE_180` `ANGLE_270`

Block

Field	Description
boundingBox	Polygon Area on the page where the text block is located.
lines[]	Line Recognized lines in this block.
languages[]	DetectedLanguage A list of detected languages
textSegments[]	TextSegments Block position from full_text string.

Polygon

Field

Description

vertices[]

Vertex

The bounding polygon vertices.

Vertex

Field

Description

string (int64)

X coordinate in pixels.

string (int64)

Y coordinate in pixels.

Line

Field	Description
boundingBox	Polygon Area on the page where the line is located.
text	string Recognized text.
words[]	Word Recognized words.
textSegments[]	TextSegments Line position from full_text string.
orientation	enum (Angle) Angle of line rotation. `ANGLE_UNSPECIFIED` `ANGLE_0` `ANGLE_90` `ANGLE_180` `ANGLE_270`

Word

Field	Description
boundingBox	Polygon Area on the page where the word is located.
text	string Recognized word value.
entityIndex	string (int64) ID of the recognized word in entities array.
textSegments[]	TextSegments Word position from full_text string.

TextSegments

Field

Description

startIndex

string (int64)

Start character position from full_text string.

length

string (int64)

Text segment length.

DetectedLanguage

Field

Description

languageCode

string

Detected language code.

Entity

Field

Description

name

string

Entity name.

text

string

Recognized entity text.

Table

Field	Description
boundingBox	Polygon Area on the page where the table is located.
rowCount	string (int64) Number of rows in table.
columnCount	string (int64) Number of columns in table.
cells[]	TableCell Table cells.

TableCell

Field	Description
boundingBox	Polygon Area on the page where the table cell is located.
rowIndex	string (int64) Row index.
columnIndex	string (int64) Column index.
columnSpan	string (int64) Column span.
rowSpan	string (int64) Row span.
text	string Text in cell.
textSegments[]	TextSegments Table cell position from full_text string.

Vision OCR API, REST: TextRecognition.Recognize

HTTP requestHTTP request

Body parametersBody parameters

ResponseResponse

TextAnnotationTextAnnotation

BlockBlock

PolygonPolygon

VertexVertex

LineLine

WordWord

TextSegmentsTextSegments

DetectedLanguageDetectedLanguage

EntityEntity

TableTable

TableCellTableCell

Was the article helpful?

HTTP request

Body parameters

Response

TextAnnotation

Block

Polygon

Vertex

Line

Word

TextSegments

DetectedLanguage

Entity

Table

TableCell