Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Vision OCR
  • Getting started
  • Access management
  • Pricing policy
  • Release notes
  • FAQ

In this article:

  • Getting started
  • Recognize text

Getting started with Vision OCR

Written by
Yandex Cloud
Updated at March 10, 2025
  • Getting started
  • Recognize text

This section describes how to recognize text in an image or file using the Vision OCR OCR API.

Getting startedGetting started

To use the examples, install cURL.

Yandex account
  1. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account yet, create one.
  2. Get an IAM token, which is required for authentication.
  3. Get the ID of any folder for which your account has the ai.vision.user role or higher.
  4. Specify the ID in the x-folder-id header.

Recognize textRecognize text

You can use any recognition model from this list. As an example, we will use the page model that enables recognizing any amount of text on an image:

  1. Prepare an image file that meets the requirements:

    • The supported file formats are JPEG, PNG, and PDF. Specify the MIME type of the file in the mime_type property. The default value is image.
    • The maximum file size is 10 MB.
    • The image size should not exceed 20 MP (height × width).

    Note

    Need a sample image? Download an image of the penguin crossing road sign.

  2. Encode the image file as Base64:

    UNIX
    Windows
    PowerShell
    Python
    Node.js
    Java
    Go
    base64 -i input.jpg > output.txt
    
    C:> Base64.exe -e input.jpg > output.txt
    
    [Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
    
    # Import a library for encoding files in Base64.
    import base64
    
    # Create a function to encode a file and return the results.
    def encode_file(file_path):
      with open(file_path, "rb") as fid:
          file_content = fid.read()
      return base64.b64encode(file_content).decode("utf-8")
    
    // Read the file contents to memory.
    var fs = require('fs');
    var file = fs.readFileSync('/path/to/file');
    
    // Get the file contents in Base64 format.
    var encoded = Buffer.from(file).toString('base64');
    
    // Import a library for encoding files in Base64.
    import org.apache.commons.codec.binary.Base64;
    
    // Get the file contents in Base64 format.
    byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
    
    import (
        "bufio"
        "encoding/base64"
        "io/ioutil"
        "os"
    )
    
    // Open the file.
    f, _ := os.Open("/path/to/file")
    
    // Read the file contents.
    reader := bufio.NewReader(f)
    content, _ := ioutil.ReadAll(reader)
    
    // Get the file contents in Base64 format.
    base64.StdEncoding.EncodeToString(content)
    
  3. Create a file with the request body, e.g., body.json.

    body.json:

    {
      "mimeType": "JPEG",
      "languageCodes": ["*"],
      "model": "page",
      "content": "<base64-encoded_image>"
    }
    

    In the content property, specify the image file contents encoded as Base64.

    For the service to automatically detect the text language, specify the "languageCodes": ["*"] property in the configuration.

  4. Send a request using the recognize method and save the response to a file, e.g., output.json:

    UNIX
    Python
    export IAM_TOKEN=<IAM_token>
    curl \
      --request POST \
      --header "Content-Type: application/json" \
      --header "Authorization: Bearer ${IAM_TOKEN}" \
      --header "x-folder-id: <folder_ID>" \
      --header "x-data-logging-enabled: true" \
      --data '{
        "mimeType": "JPEG",
        "languageCodes": ["ru","en"],
        "model": "handwritten",
        "content": "<base64_encoded_image>"
      }' \
      https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText \
      --output output.json
    

    Where:

    • <IAM_token>: Previously obtained IAM token.
    • <folder_ID>: Previously obtained folder ID.
    data = {"mimeType": <mime_type>,
            "languageCodes": ["ru","en"],
            "content": content}
    
    url = "https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText"
    
    headers= {"Content-Type": "application/json",
              "Authorization": "Bearer {:s}".format(<IAM_token>),
              "x-folder-id": "<folder_ID>",
              "x-data-logging-enabled": "true"}
      
      w = requests.post(url=url, headers=headers, data=json.dumps(data))
    

    The result will consist of recognized blocks of text, lines, and words with their position on the image:

    {
      "result": {
        "text_annotation": {
          "width": "1920",
          "height": "1280",
          "blocks": [{
            "bounding_box": {
              "vertices": [{
                "x": "460",
                "y": "777"
              }, {
                "x": "460",
                "y": "906"
              }, {
                "x": "810",
                "y": "906"
              }, {
                "x": "810",
                "y": "777"
              }]
            },
            "lines": [{
              "bounding_box": {
                "vertices": [{
                  "x": "460",
                  "y": "777"
                }, {
                  "x": "460",
                  "y": "820"
                }, {
                  "x": "802",
                  "y": "820"
                }, {
                  "x": "802",
                  "y": "777"
                }]
              },
              "alternatives": [{
                "text": "PENGUINS",
                "words": [{
                  "bounding_box": {
                    "vertices": [{
                      "x": "460",
                      "y": "768"
                    }, {
                      "x": "460",
                      "y": "830"
                    }, {
                      "x": "802",
                      "y": "830"
                    }, {
                      "x": "802",
                      "y": "768"
                    }]
                  },
                  "text": "PENGUINS",
                  "entity_index": "-1"
                }]
              }]
            }, {
              "bounding_box": {
                "vertices": [{
                  "x": "489",
                  "y": "861"
                }, {
                  "x": "489",
                  "y": "906"
                }, {
                  "x": "810",
                  "y": "906"
                }, {
                  "x": "810",
                  "y": "861"
                }]
              },
              "alternatives": [{
                "text": "CROSSING",
                "words": [{
                  "bounding_box": {
                    "vertices": [{
                      "x": "489",
                      "y": "852"
                    }, {
                      "x": "489",
                      "y": "916"
                    }, {
                      "x": "810",
                      "y": "916"
                    }, {
                      "x": "810",
                      "y": "852"
                    }]
                  },
                  "text": "CROSSING",
                  "entity_index": "-1"
                }]
              }]
            }],
            "languages": [{
              "language_code": "en"
            }]
          }, {
            "bounding_box": {
              "vertices": [{
                "x": "547",
                "y": "989"
              }, {
                "x": "547",
                "y": "1046"
              }, {
                "x": "748",
                "y": "1046"
              }, {
                "x": "748",
                "y": "989"
              }]
            },
            "lines": [{
              "bounding_box": {
                "vertices": [{
                  "x": "547",
                  "y": "989"
                }, {
                  "x": "547",
                  "y": "1046"
                }, {
                  "x": "748",
                  "y": "1046"
                }, {
                  "x": "748",
                  "y": "989"
                }]
              },
              "alternatives": [{
                "text": "SLOW",
                "words": [{
                  "bounding_box": {
                    "vertices": [{
                      "x": "547",
                      "y": "983"
                    }, {
                      "x": "547",
                      "y": "1054"
                    }, {
                      "x": "748",
                      "y": "1054"
                    }, {
                      "x": "748",
                      "y": "983"
                    }]
                  },
                  "text": "SLOW",
                  "entity_index": "-1"
                }]
              }]
            }],
            "languages": [{
              "language_code": "en"
            }]
          }],
          "entities": []
        },
        "page": "0"
      }
    }
    
  5. To get all the recognized words in an image, find all the values with the text property.

Note

If the coordinates you got do not match the position of displayed elements, set up support for exif metadata in your image viewing tool or remove the Orientation attribute from the exif image section when running a transfer to the service.

Was the article helpful?

Next
All guides
Yandex project
© 2025 Yandex.Cloud LLC