Recognizing text in image archives in Yandex Vision OCR

Written by

Improved by

Updated at May 7, 2025

Before you begin
- Required paid resources
Create a bucket
Create a VM
Configure the VM
Create an archive with images
Prepare a script for digitizing and uploading images
- Configure the environment
- Create a script
Double-check the recognition results
How to delete created resources

Use the Yandex Vision OCR service to recognize text in images. You can also store both the source images and recognition results in Yandex Object Storage.

To set up an infrastructure for text recognition using Vision OCR and export the results automatically to Object Storage:

If you no longer need these resources, delete them.

Before you begin

Navigate to the management console and log in to Yandex Cloud or register a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resources

The infrastructure costs for image recognition and data storage include:

A fee for VM computing resources (see Yandex Compute Cloud pricing).
A fee for data storage in a bucket and operations with data (see Yandex Object Storage pricing).
A fee for using a dynamic or a static public IP (see Yandex Virtual Private Cloud pricing).
A fee for using Yandex Vision OCR (see pricing for Yandex Vision OCR).

Create a bucket

To create an Object Storage bucket to store the source images and recognition results:

Management console

Go to the Yandex Cloud management console and select the folder where you will perform the operations.
On the folder page, click Create resource and select Bucket.
In the Name field, enter the bucket name following the naming conventions, such as vision-bucket.
In the Bucket access field, select Restricted.
In the Storage class field, select Cold.
Click Create bucket.

Create a VM

Management console

In the management console, click Create resource and select Virtual machine.
In the Name field, enter a name for the VM, such as vision-vm. For naming requirements, see below:
- It must be from 2 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
Select an availability zone to place the VM in.
Under Image/boot disk selection, go to the Cloud Marketplace tab and select a public CentOS 7 image.
Under Disks and file storages, select the parameters:
- Type: SSD.
- Size: 19 GB.
Under Computing resources, select:
- Platform: Intel Cascade Lake.
- Guaranteed vCPU share: 20%.
- vCPU: 2.
- RAM: 2 GB.
Under Network settings, select the network and subnet to connect the VM to. If there aren't any networks, create one:
1. Select Create network.
2. In the window that opens, enter the network name and the folder to host the network.
3. (optional) To automatically create subnets, select the Create subnets option.
4. Click Create.
  
  Each network must have at least one subnet. If there is no subnet, create one by selecting Add subnet.
In the Public address field, keep Auto to assign your VM a random external IP address from the Yandex Cloud pool, or select a static address from the list if you reserved one in advance.
Enter the VM access information:
- Enter the username in the Login field.
- In the SSH key field, paste the contents of the public key file.
  
  You will need to create a key pair for the SSH connection yourself, see Creating an SSH key pair.
Click Create VM.
Wait for the VM status to change to Running and save its public IP address: you'll need it for SSH connection.

Configure the VM

Set up the Yandex Cloud CLI

Connect to the VM via SSH.
Install the Yandex Cloud CLI and create a profile.
Make sure that the Yandex Cloud CLI runs correctly:
CLI
Run the following command on the VM:
```
yc config list
```
Result:
```
token: AQ...gs
cloud-id: b1gdtdqb1900f5rqqvli
folder-id: b1gveg9vude9g3uioa50
```
Save the folder-id parameter: you'll need it to set up a service account.

Set up a service account

CLI

Create a service account:

yc iam service-account create \
  --name <service_account_name> \
  --description "<service_account_description>"

Where:

--name is the service account name, such as vision-sa.
--description is a description of the service account, for example, this is the vision service account.

Result:

id: aje6aoc8hccuh5tp55bg
folder_id: b1gv87ssvu497lpgjh5o
created_at: "2022-10-12T14:04:43.198559512Z"
name: vision-sa
description: this is vision service account

Save the id parameter: this is the service account ID you'll need in the setup process.

Assign the editor role to the service account.

yc resource-manager folder add-access-binding <folder_id> \
  --role editor \
  --subject serviceAccount:<service_account_ID>

Where:

--role: The role assigned.
--subject serviceAccount: Service account ID.

Create a static access key for the service account.

yc iam access-key create \
  --service-account-id <service_account_ID> \
  --description "<key_description>"

Where:

--service-account-id: Service account ID.
--description: A description for the key, for example, this key is for vision.

Result:

access_key:
  id: ajen8d7fur27bt8losom
  service_account_id: aje6aoc8hccuh5tp55bg
  created_at: "2022-10-12T15:08:08.045280520Z"
  description: this key is for vision
  key_id: YC...li
secret: YC...J5

Save the following parameters (you'll need them to set up the AWS CLI utility):

key_id: The ID of the static access key.
secret: The secret key.

Create an authorized key for a service account:

yc iam key create \
  --service-account-id <service_account_ID> \
  --output key.json

Where:

--service-account-id: Service account ID.
--output: The name of JSON file with an authorized key.

Result:

id: aje3qc9pagb9kedkhdn5
service_account_id: aje6aoc8hccuh5tp55bg
created_at: "2022-10-13T12:53:04.810240976Z"
key_algorithm: RSA_2048

Create a Yandex Cloud CLI profile to run on behalf of the service account, such as vision-profile:
```
yc config profile create vision-profile
```
Result:
```
Profile 'vision-profile' created and activated
```
Specify the authorized key of the service account in the profile configuration:
```
yc config set service-account-key key.json
```
Get an IAM token for the service account:
```
yc iam create-token
```
Save the IAM token, you'll need it to send images to Vision OCR.

Set up the AWS CLI

Update the packages installed in the VM operating system. To do this, run the command:
```
sudo yum update -y
```
Install the AWS CLI:
```
sudo yum install awscli -y
```
Set up the AWS CLI:
```
aws configure
```
Specify the parameter values:
- AWS Access Key ID: The ID of the key_id static access key that you generated when setting up the service account.
- AWS Secret Access Key: The secret key that you generated when setting up the service account.
- Default region name: ru-central1.
- Default output format: json.
Make sure that the ~/.aws/credentials file contains relevant values for key_id and secret:
```
cat ~/.aws/credentials
```
Make sure that the ~/.aws/config file contains relevant values for Default region name and Default output format:
```
cat ~/.aws/config
```

Create an archive with images

Upload your images that include recognizable text to the bucket.

Tip

Use the sample image of the penguin crossing road sign.
To make sure that the images were uploaded, use the request with the bucket name:
```
aws --endpoint-url=https://storage.yandexcloud.net s3 ls s3://<bucket_name>/
```

Save the images from the bucket to the VM, for example, to the my_pictures folder:

aws --endpoint-url=https://storage.yandexcloud.net s3 cp s3://<bucket_name>/ my_pictures --recursive

Compress the images into an archive, for example, my_pictures.tar:
```
tar -cf my_pictures.tar my_pictures/*
```
Delete the image directory:
```
rm -rfd my_pictures
```

Prepare a script for digitizing and uploading images

Configure the environment

Install the jq package. The script will use it to process the results from Vision OCR:
```
sudo yum install jq -y
```
Install the text editor nano:
```
sudo yum install nano -y
```
Set the environment variables necessary for the script to run:
```
export BUCKETNAME="<bucket_name>"
export FOLDERID="<folder_id>"
export IAMTOKEN="<IAM_token>"
```
Where:
- BUCKETNAME: The bucket name.
- FOLDERID: The folder ID.
- IAMTOKEN: The IAM token that you issued when setting up the service account.

Create a script

The script includes the following steps:

Create the relevant directories.
Unpack the archive with images.
Process all the images one-by-one:
1. Base64-encode the image.
2. Create a request body for the given image.
3. Send the image in a POST request to Vision OCR for recognition.
4. Save the result to the output.json file.
5. Extract the recognized text from output.json and save it to a text file.
Add the resulting text files to an archive.
Upload the archive with the text files to Object Storage.
Delete the auxiliary files.

For your convenience, the text of the script includes comments to each step.

To implement the script:

Create a file, for example, vision.sh and open it in the nano text editor:
```
sudo nano vision.sh
```

Copy the script text to vision.sh:

#!/bin/bash

# Create the relevant directories
 echo "Creating directories..."

# Create a directory for the recognized text
mkdir my_pictures_text

# Unpack the archive with images to the created directory
echo "Extract pictures in my_pictures directory..."
tar -xf my_pictures.tar

# Recognize the images from the archive
FILES=my_pictures/*
for f in $FILES
# Loop through the files in the directory to run the actions:
do
    # Base64-encode the image to send it to Vision OCR
    CODEIMG=$(base64 -i $f | cat)

    # Create the body.json file to be sent in a POST request to Vision OCR
    cat <<EOF > body.json
{
"folderId": "$FOLDERID",
"analyze_specs": [{
"content": "$CODEIMG",
"features": [{
"type": "TEXT_DETECTION",
"text_detection_config": {
"language_codes": ["en","ru"]
}
}]
}]
}
EOF
    # Send the image to Vision OCR for recognition and write the result to output.json
    echo "Processing file $f in Vision OCR..."
    curl -X POST --silent \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer ${IAMTOKEN}" \
    -d '@body.json' \
    https://vision.api.cloud.yandex.net/vision/v1/batchAnalyze > output.json

    # Get the image file name to be used below
    IMAGE_BASE_NAME=$(basename -- "$f")
    IMAGE_NAME="${IMAGE_BASE_NAME%.*}"

    # Get text data from output.json and write it to a TXT file named identically with the image file
    cat output.json | jq -r '.results[].results[].textDetection.pages[].blocks[].lines[].words[].text' | awk -v ORS=" " '{print}' > my_pictures_text/$IMAGE_NAME".txt"
done

# Add the directory with the text files to an archive
echo "Packing text files to archive..."
tar -cf my_pictures_text.tar my_pictures_text

# Send the text file archive to the bucket
echo "Sending archive to Object Storage Bucket..."
aws --endpoint-url=https://storage.yandexcloud.net s3 cp my_pictures_text.tar s3://$BUCKETNAME/ > /dev/null

# Delete the auxiliary files
echo "Cleaning up..."
rm -f body.json
rm -f output.json
rm -rfd my_pictures
rm -rfd my_pictures_text
rm -r my_pictures_text.tar

Set the permissions to run the script:
```
sudo chmod 755 vision.sh
```
Run the script:
```
./vision.sh
```

Double-check the recognition results

Management console

In the Yandex Cloud management console, select the folder where the bucket with the recognition results is located.
Select Object Storage.
Open the bucket with the recognition results.
Make sure that the bucket contains the my_pictures_text.tar archive.
Download and unpack the archive.
Make sure that the text in the <image name>.txt file matches the text in the image.

How to delete created resources

To stop paying for the resources created:

Delete all the objects from the bucket.
Delete the respective bucket.
Delete the VM.
Delete the static public IP if you reserved one.

Recognizing text in image archives in Yandex Vision OCR

Before you begin

Required paid resources

Create a bucket

Create a VM

Configure the VM

Set up the Yandex Cloud CLI

Set up a service account

Set up the AWS CLI

Create an archive with images

Prepare a script for digitizing and uploading images

Configure the environment

Create a script

Double-check the recognition results

How to delete created resources

Was the article helpful?