Vision OCR API, gRPC: TextRecognitionAsyncService.GetRecognition

Статья создана

Yandex Cloud

Обновлена 8 августа 2025 г.

gRPC request
GetRecognitionRequest
RecognizeTextResponse
TextAnnotation
Block
Polygon
Vertex
Line
Word
TextSegments
DetectedLanguage
Entity
Table
TableCell
Picture

To get recognition results.

gRPC request

rpc GetRecognition (GetRecognitionRequest) returns (stream RecognizeTextResponse)

GetRecognitionRequest

{
  "operation_id": "string"
}

Field

Description

operation_id

string

Required field. Operation ID of async recognition request.

RecognizeTextResponse

{
  "text_annotation": {
    "width": "int64",
    "height": "int64",
    "blocks": [
      {
        "bounding_box": {
          "vertices": [
            {
              "x": "int64",
              "y": "int64"
            }
          ]
        },
        "lines": [
          {
            "bounding_box": {
              "vertices": [
                {
                  "x": "int64",
                  "y": "int64"
                }
              ]
            },
            "text": "string",
            "words": [
              {
                "bounding_box": {
                  "vertices": [
                    {
                      "x": "int64",
                      "y": "int64"
                    }
                  ]
                },
                "text": "string",
                "entity_index": "int64",
                "text_segments": [
                  {
                    "start_index": "int64",
                    "length": "int64"
                  }
                ]
              }
            ],
            "text_segments": [
              {
                "start_index": "int64",
                "length": "int64"
              }
            ],
            "orientation": "Angle"
          }
        ],
        "languages": [
          {
            "language_code": "string"
          }
        ],
        "text_segments": [
          {
            "start_index": "int64",
            "length": "int64"
          }
        ],
        "layout_type": "LayoutType"
      }
    ],
    "entities": [
      {
        "name": "string",
        "text": "string"
      }
    ],
    "tables": [
      {
        "bounding_box": {
          "vertices": [
            {
              "x": "int64",
              "y": "int64"
            }
          ]
        },
        "row_count": "int64",
        "column_count": "int64",
        "cells": [
          {
            "bounding_box": {
              "vertices": [
                {
                  "x": "int64",
                  "y": "int64"
                }
              ]
            },
            "row_index": "int64",
            "column_index": "int64",
            "column_span": "int64",
            "row_span": "int64",
            "text": "string",
            "text_segments": [
              {
                "start_index": "int64",
                "length": "int64"
              }
            ]
          }
        ]
      }
    ],
    "full_text": "string",
    "rotate": "Angle",
    "markdown": "string",
    "pictures": [
      {
        "bounding_box": {
          "vertices": [
            {
              "x": "int64",
              "y": "int64"
            }
          ]
        },
        "score": "double"
      }
    ]
  },
  "page": "int64"
}

Field

Description

text_annotation

TextAnnotation

Recognized text blocks in page or text from entities.

page

int64

Page number in PDF file.

TextAnnotation

Field	Description
width	int64 Page width in pixels.
height	int64 Page height in pixels.
blocks[]	Block Recognized text blocks in this page.
entities[]	Entity Recognized entities.
tables[]	Table
full_text	string Full text recognized from image.
rotate	enum Angle Angle of image rotation. `ANGLE_UNSPECIFIED` `ANGLE_0` `ANGLE_90` `ANGLE_180` `ANGLE_270`
markdown	string Full markdown (without pictures inside) from image. Available only in markdown and math-markdown models.
pictures[]	Picture List of pictures locations from image.

Block

Field	Description
bounding_box	Polygon Area on the page where the text block is located.
lines[]	Line Recognized lines in this block.
languages[]	DetectedLanguage A list of detected languages
text_segments[]	TextSegments Block position from full_text string.
layout_type	enum LayoutType Block layout type. `LAYOUT_TYPE_UNSPECIFIED` `LAYOUT_TYPE_UNKNOWN` `LAYOUT_TYPE_TEXT` `LAYOUT_TYPE_HEADER` `LAYOUT_TYPE_SECTION_HEADER` `LAYOUT_TYPE_FOOTER` `LAYOUT_TYPE_FOOTNOTE` `LAYOUT_TYPE_PICTURE` `LAYOUT_TYPE_CAPTION` `LAYOUT_TYPE_TITLE` `LAYOUT_TYPE_LIST`

Polygon

Field

Description

vertices[]

Vertex

The bounding polygon vertices.

Vertex

Field

Description

int64

X coordinate in pixels.

int64

Y coordinate in pixels.

Line

Field	Description
bounding_box	Polygon Area on the page where the line is located.
text	string Recognized text.
words[]	Word Recognized words.
text_segments[]	TextSegments Line position from full_text string.
orientation	enum Angle Angle of line rotation. `ANGLE_UNSPECIFIED` `ANGLE_0` `ANGLE_90` `ANGLE_180` `ANGLE_270`

Word

Field	Description
bounding_box	Polygon Area on the page where the word is located.
text	string Recognized word value.
entity_index	int64 ID of the recognized word in entities array.
text_segments[]	TextSegments Word position from full_text string.

TextSegments

Field

Description

start_index

int64

Start character position from full_text string.

length

int64

Text segment length.

DetectedLanguage

Field

Description

language_code

string

Detected language code.

Entity

Field

Description

name

string

Entity name.

text

string

Recognized entity text.

Table

Field	Description
bounding_box	Polygon Area on the page where the table is located.
row_count	int64 Number of rows in table.
column_count	int64 Number of columns in table.
cells[]	TableCell Table cells.

TableCell

Field	Description
bounding_box	Polygon Area on the page where the table cell is located.
row_index	int64 Row index.
column_index	int64 Column index.
column_span	int64 Column span.
row_span	int64 Row span.
text	string Text in cell.
text_segments[]	TextSegments Table cell position from full_text string.

Picture

Field

Description

bounding_box

Polygon

Area on the page where the picture is located.

score

double

Confidence score of picture location.

Vision OCR API, gRPC: TextRecognitionAsyncService.GetRecognition

gRPC requestgRPC request

GetRecognitionRequestGetRecognitionRequest

RecognizeTextResponseRecognizeTextResponse

TextAnnotationTextAnnotation

BlockBlock

PolygonPolygon

VertexVertex

LineLine

WordWord

TextSegmentsTextSegments

DetectedLanguageDetectedLanguage

EntityEntity

TableTable

TableCellTableCell

PicturePicture

Была ли статья полезна?

gRPC request

GetRecognitionRequest

RecognizeTextResponse

TextAnnotation

Block

Polygon

Vertex

Line

Word

TextSegments

DetectedLanguage

Entity

Table

TableCell

Picture