Text recognition in images
You can recognize text in an image using the OCR API. The OCR API is an updated and revised interface with enhanced features, including multi-column text recognition.
Getting started
To use the examples, install cURL
Get your account data for authentication:
-
Get an IAM token for your Yandex account or federated account.
-
Get the ID of the folder for which your account has the
ai.vision.user
role or higher. -
When accessing Vision OCR via the API, provide the received parameters in each request:
-
For the Vision API and Classifier API:
Specify the IAM token in the
Authorization
header as follows:Authorization: Bearer <IAM_token>
Specify the folder ID in the request body in the
folderId
parameter. -
For the OCR API:
- Specify the IAM token in the
Authorization
header. - Specify the folder ID in the
x-folder-id
header.
Authorization: Bearer <IAM_token> x-folder-id <folder_ID>
- Specify the IAM token in the
-
Vision OCR supports two authentication methods based on service accounts:
-
With an IAM token:
-
Get an IAM token.
-
Provide the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM_token>
-
-
With API keys.
API keys do not expire. This means that this authentication method is simpler, but less secure. Use it if you can't automatically request an IAM token.
-
Provide the API key in the
Authorization
header in the following format:Authorization: Api-Key <API key>
Do not specify the folder ID in your requests, as the service uses the folder the service account was created in.
Recognizing text in an image through the OCR API
Image text recognition is implemented in the recognize OCR API method.
-
Prepare an image file that meets the requirements:
- The supported file formats are JPEG, PNG, and PDF. Specify the MIME type
of the file in themime_type
property. The default value isimage
. - The maximum file size is 10 MB.
- The image size should not exceed 20 MP (height × width).
Note
Need a sample image? Download an image of the penguin crossing
road sign. - The supported file formats are JPEG, PNG, and PDF. Specify the MIME type
-
Encode the image file as Base64:
UNIXWindowsPowerShellPythonNode.jsJavaGobase64 -i input.jpg > output.txt
C:> Base64.exe -e input.jpg > output.txt
[Convert]::ToBase64String([IO.File]::ReadAllBytes("./input.jpg")) > output.txt
# Import a library for encoding files in Base64 import base64 # Create a function that will encode a file and return results. def encode_file(file_path): with open(file_path, "rb") as fid: file_content = fid.read() return base64.b64encode(file_content).decode("utf-8")
// Read the file contents to memory. var fs = require('fs'); var file = fs.readFileSync('/path/to/file'); // Get the file contents in Base64 format. var encoded = Buffer.from(file).toString('base64');
// Import a library for encoding files in Base64. import org.apache.commons.codec.binary.Base64; // Get the file contents in Base64 format. byte[] fileData = Base64.encodeBase64(yourFile.getBytes());
import ( "bufio" "encoding/base64" "io/ioutil" "os" ) // Open the file. f, _ := os.Open("/path/to/file") // Read the file contents. reader := bufio.NewReader(f) content, _ := ioutil.ReadAll(reader) // Get the file contents in Base64 format. base64.StdEncoding.EncodeToString(content)
-
Create a file with the request body, e.g.,
body.json
.body.json:
{ "mimeType": "JPEG", "languageCodes": ["*"], "model": "page", "content": "<base64_encoded_image>" }
In the
content
property, specify the image file contents encoded as Base64.To automatically detect the text language, indicate the
"languageCodes": ["*"]
property in the configuration. -
UNIXPython
export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --header "x-folder-id: <folder_ID>" \ --header "x-data-logging-enabled: true" \ --data "@body.json" \ https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText \ --output output.json
Where:
<IAM_token>
: Previously obtained IAM token.<folder_ID>
: Previously obtained folder ID.
data = {"mimeType": <mime_type>, "languageCodes": ["*"], "content": content} url = "https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText" headers= {"Content-Type": "application/json", "Authorization": "Bearer {:s}".format(<IAM_token>), "x-folder-id": "<folder_ID>", "x-data-logging-enabled": "true"} w = requests.post(url=url, headers=headers, data=json.dumps(data))
The result will consist of recognized blocks of text, lines, and words with their position on the image:
{ "result": { "text_annotation": { "width": "1920", "height": "1280", "blocks": [{ "bounding_box": { "vertices": [{ "x": "460", "y": "777" }, { "x": "460", "y": "906" }, { "x": "810", "y": "906" }, { "x": "810", "y": "777" }] }, "lines": [{ "bounding_box": { "vertices": [{ "x": "460", "y": "777" }, { "x": "460", "y": "820" }, { "x": "802", "y": "820" }, { "x": "802", "y": "777" }] }, "alternatives": [{ "text": "PENGUINS", "words": [{ "bounding_box": { "vertices": [{ "x": "460", "y": "768" }, { "x": "460", "y": "830" }, { "x": "802", "y": "830" }, { "x": "802", "y": "768" }] }, "text": "PENGUINS", "entity_index": "-1" }] }] }, { "bounding_box": { "vertices": [{ "x": "489", "y": "861" }, { "x": "489", "y": "906" }, { "x": "810", "y": "906" }, { "x": "810", "y": "861" }] }, "alternatives": [{ "text": "CROSSING", "words": [{ "bounding_box": { "vertices": [{ "x": "489", "y": "852" }, { "x": "489", "y": "916" }, { "x": "810", "y": "916" }, { "x": "810", "y": "852" }] }, "text": "CROSSING", "entity_index": "-1" }] }] }], "languages": [{ "language_code": "en" }] }, { "bounding_box": { "vertices": [{ "x": "547", "y": "989" }, { "x": "547", "y": "1046" }, { "x": "748", "y": "1046" }, { "x": "748", "y": "989" }] }, "lines": [{ "bounding_box": { "vertices": [{ "x": "547", "y": "989" }, { "x": "547", "y": "1046" }, { "x": "748", "y": "1046" }, { "x": "748", "y": "989" }] }, "alternatives": [{ "text": "SLOW", "words": [{ "bounding_box": { "vertices": [{ "x": "547", "y": "983" }, { "x": "547", "y": "1054" }, { "x": "748", "y": "1054" }, { "x": "748", "y": "983" }] }, "text": "SLOW", "entity_index": "-1" }] }] }], "languages": [{ "language_code": "en" }] }], "entities": [] }, "page": "0" } }
-
To get all the words recognized in the image, find all values with the
text
property.
Note
If the coordinates you got do not match the position of displayed elements, set up support for exif
metadata in your image viewing tool or remove the Orientation
attribute from the exif
image section when running a transfer to the service.