Recognizing text in image archives in Yandex Vision OCR

Written by

Improved by

Updated at April 22, 2025

Getting started
- Required paid resources
Create a bucket
Create a VM
Configure the VM
Create an image archive
Prepare a script for digitizing and uploading images
- Configure the environment
- Create a script
Double-check the recognition results
How to delete the resources you created

Using Vision OCR and Yandex Object Storage, you can manage image text recognition and store both the source image archive and recognition results.

To configure a text recognition infrastructure using Vision OCR and automatically export the results to Object Storage:

If you no longer need the resources you created, delete them.

Getting started

Go to the management console and log in to Yandex Cloud or create an account if you do not have one yet.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one.

If you have an active billing account, you can go to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resources

The infrastructure costs for image recognition and data storage include:

Fee for VM computing resources and disks (see Yandex Compute Cloud pricing).
Fee for data storage in a bucket and data operations (see Object Storage pricing).
Fee for using a dynamic or static public IP address (see Yandex Virtual Private Cloud pricing).
Fee for using Vision OCR (see Vision OCR pricing).

Create a bucket

To create an Object Storage bucket to store the source images and recognition results:

Management console

Go to the management console and select the folder where you will perform the operations.
Select Object Storage.
Click Create bucket.
Enter a name for the bucket consistent with the naming requirements.
In the Object read access field, select Restricted.
In the Storage class field, select Cold.
Click Create bucket.

Create a VM

Management console

On the folder dashboard of the management console, click Create resource and select Virtual machine instance.
Under Boot disk image, in the Product search field, enter CentOS 7 and select a public CentOS 7 image.
Under Location, select an availability zone where your VM will reside. If you do not know which availability zone you need, leave the default one.
Under Disks and file storages, select theSSD disk and specify its size: 19 GB.
Under Computing resources, navigate to the Custom tab and specify the required platform, number of vCPUs, and the amount of RAM:
- Platform: Intel Cascade Lake
- vCPU: 2
- Guaranteed vCPU performance: 20%
- RAM: 2 GB
Under Network settings:
- In the Subnet field, select the network and subnet to connect your VM to. If the required network or subnet is not there, create it.
- Under Public IP address, leave Auto to assign a random external IP address to your VM from the Yandex Cloud pool. Alternatively, select a static address from the list if you reserved one.
Under Access, select SSH key and specify the VM access credentials:
- Under Login, enter a username. Do not use root or other reserved usernames. To perform operations requiring root privileges, use the sudo command.
- In the SSH key field, select the SSH key saved in your organization user profile.
  
  If there are no saved SSH keys in your profile, or you want to add a new key:
  - Click Add key.
  - Enter a name for the SSH key.
  - Upload or paste the contents of the public key file. You need to create a key pair for the SSH connection to a VM yourself.
  - Click Add.
  The SSH key will be added to your organization user profile.
  
  If users cannot add SSH keys to their profiles in the organization, the added public SSH key will only be saved to the user profile of the VM being created.
Under General information, specify the VM name. Follow these naming requirements:
- It must be from 2 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
Click Create VM.
Wait until the VM status switches to Running and save the VM’s public IP address required for SSH connection.

Configure the VM

Configure the Yandex Cloud CLI

Connect to the VM over SSH.
Install the Yandex Cloud CLI and create a profile.
Make sure the Yandex Cloud CLI runs correctly:
CLI
Run this command on the VM:
```
yc config list
```
Result:
```
token: AQ...gs
cloud-id: b1gdtdqb1900********
folder-id: b1gveg9vude9********
```
Save the folder ID from the folder-id property. This ID is required for configuring a service account.

Configure a service account

CLI

Create a service account:

yc iam service-account create \
  --name <service_account_name> \
  --description "<service_account_description>"

Where:

--name: Service account name, e.g., vision-sa.
--description: Service account description, e.g., this is vision service account.

Result:

id: aje6aoc8hccu********
folder_id: b1gv87ssvu49********
created_at: "2022-10-12T14:04:43.198559512Z"
name: vision-sa
description: this is vision service account

Save the service account ID from the id property. This ID is required for further configuration.

Assign the editor role to the service account:

yc resource-manager folder add-access-binding <folder_ID> \
  --role editor \
  --subject serviceAccount:<service_account_ID>

Where:

--role: Role to assign.
--subject serviceAccount: Service account ID.

Create a static access key for your service account:

yc iam access-key create \
  --service-account-id <service_account_ID> \
  --description "<key_description>"

Where:

--service-account-id: Service account ID.
--description: Key description, e.g., this key is for vision.

Result:

access_key:
  id: ajen8d7fur27********
  service_account_id: aje6aoc8hccu********
  created_at: "2022-10-12T15:08:08.045280520Z"
  description: this key is for vision
  key_id: YC...li
secret: YC...J5

Save these properties, as you will need them to configure the AWS CLI:

key_id: Static access key ID
secret: Secret key

Create an authorized key for the service account:

yc iam key create \
  --service-account-id <service_account_ID> \
  --output key.json

Where:

--service-account-id.
--output: Name of the authorized key JSON file.

Result:

id: aje3qc9pagb9********
service_account_id: aje6aoc8hccu********
created_at: "2022-10-13T12:53:04.810240976Z"
key_algorithm: RSA_2048

Create a Yandex Cloud CLI profile to run under the service account, such as vision-profile:
```
yc config profile create vision-profile
```
Result:
```
Profile 'vision-profile' created and activated
```
Specify the service account’s authorized key in the profile configuration:
```
yc config set service-account-key key.json
```
Get a IAM token for the service account:
```
yc iam create-token
```
Save the IAM token. You will need it to send images to Vision OCR.

Configure the AWS CLI

Update the packages on the VM operating system. To do this, run this command:
```
sudo yum update -y
```
Install the AWS CLI:
```
sudo yum install awscli -y
```
Configure the AWS CLI:
```
aws configure
```
Specify these settings:
- AWS Access Key ID: Static access key ID (key_id) you got when configuring the service account.
- AWS Secret Access Key: Secret key (secret) you got when configuring the service account.
- Default region name: ru-central1.
- Default output format: json.
Make sure the ~/.aws/credentials file contains the correct key_id and secret values:
```
cat ~/.aws/credentials
```
Make sure the ~/.aws/config file contains the correct Default region name and Default output format values:
```
cat ~/.aws/config
```

Create an image archive

Upload your images that include recognizable text to the bucket.

Tip

Use the sample image of the penguin crossing road sign.

To make sure you have uploaded the images, use a request with the bucket name:

aws --endpoint-url=https://storage.yandexcloud.net s3 ls s3://<bucket_name>/

Save the images from the bucket to the VM, e.g., to the my_pictures directory:

aws --endpoint-url=https://storage.yandexcloud.net s3 cp s3://<bucket_name>/ my_pictures --recursive

Pack the images into an archive, e.g., my_pictures.tar:
```
tar -cf my_pictures.tar my_pictures/*
```
Delete the image directory:
```
rm -rfd my_pictures
```

Prepare a script for digitizing and uploading images

Configure the environment

Install the epel repository for additional packages:
```
sudo yum install epel-release -y
```
Install the jq package to process the results from Vision OCR:
```
sudo yum install jq -y
```
Install the nano text editor:
```
sudo yum install nano -y
```
Set the environment variables required for the script to run:
```
export BUCKETNAME="<bucket_name>"
export FOLDERID="<folder_ID>"
export IAMTOKEN="<IAM_token>"
```
Where:
- BUCKETNAME: Bucket name.
- FOLDERID: Folder ID.
- IAMTOKEN: IAM token you got when configuring the service account.

Create a script

Bash

The script includes these steps:

Create the appropriate directories.
Unpack the image archive.
Process all images one by one:
1. Encode the image as Base64.
2. Create a request body for the specific image.
3. Send the image in a POST request to Vision OCR for recognition.
4. Save the result to the output.json file.
5. Extract the recognized text from output.json and save it to a text file.
Pack the text files you got into an archive.
Upload the text files archive to Object Storage.
Delete the auxiliary files.

For your convenience, the script text includes comments to each step.

To implement the script:

Create a file, e.g., vision.sh, and open it in the nano text editor:
```
sudo nano vision.sh
```

Copy the Bash script text to vision.sh:

#!/bin/bash

# Create the appropriate directories.
echo "Creating directories..."

# Create a directory for recognized text.
mkdir my_pictures_text

# Unpack the image archive to the directory you created.
echo "Extract pictures in my_pictures directory..."
tar -xf my_pictures.tar

# Recognize the images from the archive.
FILES=my_pictures/*
for f in $FILES
# For each file in the directory, perform these actions in a loop:
do
   # Encode the image as Base64 for sending it to Vision OCR.
   CODEIMG=$(base64 -i $f | cat)

   # Create a `body.json` file to send to Vision OCR in a POST request.
   cat <<EOF > body.json
{
"mimeType": "JPEG",
"languageCodes": ["*"],
"model": "page",
"content": "$CODEIMG"
}
EOF
# Send the image to Vision OCR for recognition and write the result to the `output.json` file.
echo "Processing file $f in Vision..."
curl --request POST \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer ${IAMTOKEN}" \
  --header "x-data-logging-enabled: true" \
  --header "x-folder-id: ${FOLDERID}" \
  --data '@body.json' \
  https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText \
  --output output.json

# Get the image file name to use it later.
IMAGE_BASE_NAME=$(basename -- "$f")
IMAGE_NAME="${IMAGE_BASE_NAME%.*}"

# Get the text data from the `output.json` file and write it to a .txt file with the same name as the image file.
cat output.json | jq -r '.result[].blocks[].lines[].text' | awk -v ORS=" " '{print}' > my_pictures_text/$IMAGE_NAME".txt"
done

# Archive the contents of the text file directory.
echo "Packing text files to archive..."
tar -cf my_pictures_text.tar my_pictures_text

# Move the text file archive to your bucket.
echo "Sending archive to Object Storage Bucket..."
aws --endpoint-url=https://storage.yandexcloud.net s3 cp my_pictures_text.tar s3://$BUCKETNAME/ > /dev/null

# Delete the auxiliary files.
echo "Cleaning up..."
rm -f body.json
rm -f output.json
rm -rfd my_pictures
rm -rfd my_pictures_text
rm -r my_pictures_text.tar

Set the permissions to run the script:
```
sudo chmod 755 vision.sh
```
Run the script:
```
./vision.sh
```

Double-check the recognition results

Management console

In the Yandex Cloud management console, select the folder containing the bucket with the recognition results.
Select Object Storage.
Open the bucket with the recognition results.
Make sure the bucket contains the my_pictures_text.tar archive.
Download and unpack the archive.
Make sure the text in the <image_name>.txt files matches that in the respective images.

How to delete the resources you created

To stop paying for the resources you created:

Delete all objects from the bucket.
Delete the bucket.
Delete the VM.
Delete the static public IP if you reserved one.

Recognizing text in image archives in Yandex Vision OCR

Getting startedGetting started

Required paid resourcesRequired paid resources

Create a bucketCreate a bucket

Create a VMCreate a VM

Configure the VMConfigure the VM

Configure the Yandex Cloud CLIConfigure the Yandex Cloud CLI

Configure a service accountConfigure a service account

Configure the AWS CLIConfigure the AWS CLI

Create an image archiveCreate an image archive

Prepare a script for digitizing and uploading imagesPrepare a script for digitizing and uploading images

Configure the environmentConfigure the environment

Create a scriptCreate a script

Double-check the recognition resultsDouble-check the recognition results

How to delete the resources you createdHow to delete the resources you created

Was the article helpful?

Getting started

Required paid resources

Create a bucket

Create a VM

Configure the VM

Configure the Yandex Cloud CLI

Configure a service account

Configure the AWS CLI

Create an image archive

Prepare a script for digitizing and uploading images

Configure the environment

Create a script

Double-check the recognition results

How to delete the resources you created