Recognizing text in image archives using Yandex Vision OCR
With Vision OCR
To set up an Vision OCR infrastructure for image recognition and automatic export of the results to Object Storage:
- Get your cloud ready.
- Create a bucket.
- Create a VM.
- Configure the VM.
- Create an image archive.
- Prepare a script for recognizing and uploading images.
- Verify the recognition accuracy.
If you no longer need the resources you created, delete them.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page
Learn more about clouds and folders here.
Required paid resources
The cost of supporting the infrastructure for image recognition and data storage includes:
- Fee for VM computing resources and disks (see Yandex Compute Cloud pricing).
- Fee for bucket data storage and data operations (see Object Storage pricing).
- Fee for a static or dynamic public IP address (see Yandex Virtual Private Cloud pricing).
- Fee for using Vision OCR (see Vision OCR pricing
).
Create a bucket
To create an Object Storage bucket for storing source images and recognition results:
- In the management console
, select the folder where you will operate. - Go to Object Storage.
- Click Create bucket.
- Specify the bucket name that meets these naming conventions.
- In the Read objects field, select With authorization.
- In the Storage class field, select Cold.
- Click Create bucket.
Create a VM
-
In the management console
, navigate to the folder dashboard, click Create resource, and selectVirtual machine instance. -
Under Boot disk image, enter
CentOS 7in the Product search field and select a public CentOS 7 image. -
Under Location, select an availability zone where your VM will reside. If you are not sure which availability zone you need, leave the default selection.
-
Under Disks and file storages, select the disk type (
SSD) and specify its size (19 GB). -
Under Computing resources, switch to the
Customtab and specify the platform, number of vCPUs, and amount of RAM:- Platform:
Intel Cascade Lake - vCPU:
2 - Guaranteed vCPU performance:
20% - RAM:
2 GB
- Platform:
-
Under Network settings:
- In the Subnet field, select the network and subnet to which you want to connect your VM. If the network or subnet you need does not exist yet, create it.
- In the Public IP address field, select a static IP address from the list, or leave
Autoto assign your VM a random external IP address from the Yandex Cloud pool.
-
Under Access, select SSH key and specify the VM access credentials:
- In the Login field, enter the username. Do not use
rootor other reserved usernames. For operations requiring root privileges, use thesudocommand. -
In the SSH key field, select the SSH key saved in your organization user profile.
If there are no SSH keys in your profile or you want to add a new key:
-
Click Add key.
-
Enter a name for the SSH key.
-
Select one of the following:
-
Enter manually: Paste the contents of the public SSH key. You need to create an SSH key pair on your own. -
Load from file: Upload the public part of the SSH key. You need to create an SSH key pair on your own. -
Generate key: Automatically create an SSH key pair.When adding a new SSH key, an archive containing the key pair will be created and downloaded. In Linux or macOS-based operating systems, unpack the archive to the
/home/<user_name>/.sshdirectory. In Windows, unpack the archive to theC:\Users\<user_name>/.sshdirectory. You do not need additionally enter the public key in the management console.
-
-
Click Add.
The system will add the SSH key to your organization user profile. If the organization has disabled the ability for users to add SSH keys to their profiles, the added public SSH key will only be saved in the user profile inside the newly created resource.
-
- In the Login field, enter the username. Do not use
-
Under General information, specify the VM name. The naming requirements are as follows:
- Length: between 3 and 63 characters.
- It can only contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
-
Click Create VM.
-
Wait for the VM status to change to
Runningand save its public IP address; you will need it for SSH connection.
Configure the VM
Configure the Yandex Cloud CLI
-
Connect to the VM over SSH.
-
Install the Yandex Cloud CLI on your VM and create a profile.
-
Make sure the Yandex Cloud CLI is working properly:
CLIRun the following command on your VM:
yc config listResult:
token: AQ...gs cloud-id: b1gdtdqb1900******** folder-id: b1gveg9vude9********Save the folder ID from the
folder-idfield in the response; you will need it for configuring a service account.
Configure a service account
-
Create a service account:
yc iam service-account create \ --name <service_account_name> \ --description "<service_account_description>"Where:
--name: Service account name, e.g.,vision-sa.--description: Service account description, e.g.,this is vision service account.
Result:
id: aje6aoc8hccu******** folder_id: b1gv87ssvu49******** created_at: "2022-10-12T14:04:43.198559512Z" name: vision-sa description: this is vision service accountSave the service account ID from the
idfield in the response; you will need it for further configuration. -
Assign the
editorrole to the service account:yc resource-manager folder add-access-binding <folder_ID> \ --role editor \ --subject serviceAccount:<service_account_ID>Where:
--role: Role to assign.--subject serviceAccount: Service account ID.
-
Create a static access key for your service account:
yc iam access-key create \ --service-account-id <service_account_ID> \ --description "<key_description>"Where:
--service-account-id: Service account ID.--description: Key description, e.g.,this key is for vision.
Result:
access_key: id: ajen8d7fur27******** service_account_id: aje6aoc8hccu******** created_at: "2022-10-12T15:08:08.045280520Z" description: this key is for vision key_id: YC...li secret: YC...J5Save the following values, as you will need them to configure the AWS CLI:
key_id: Static access key IDsecret: Secret key
-
Create an authorized key for the service account:
yc iam key create \ --service-account-id <service_account_ID> \ --output key.jsonWhere:
--service-account-id.--output: Name of the JSON file containing your authorized key.
Result:
id: aje3qc9pagb9******** service_account_id: aje6aoc8hccu******** created_at: "2022-10-13T12:53:04.810240976Z" key_algorithm: RSA_2048 -
Create a Yandex Cloud CLI profile to run as the service account, e.g.,
vision-profile:yc config profile create vision-profileResult:
Profile 'vision-profile' created and activated -
In the profile configuration, specify the service account’s authorized key:
yc config set service-account-key key.json -
Obtain an IAM token for the service account:
yc iam create-tokenSave the IAM token. You will need it to send images to Vision OCR.
Configure the AWS CLI
-
Update the packages installed on your VM’s operating system by running the following command:
sudo yum update -y -
Install the AWS CLI:
sudo yum install awscli -y -
Configure the AWS CLI:
aws configureSpecify these settings:
AWS Access Key ID: Static access key ID (key_id) that you received when configuring the service account.AWS Secret Access Key: Secret key (secret) that you received when configuring the service account.Default region name:ru-central1.Default output format:json.
-
Verify that the
~/.aws/credentialsfile contains the correctkey_idandsecretvalues:cat ~/.aws/credentials -
Verify that the
~/.aws/configfile contains the correctDefault region nameandDefault output formatvalues:cat ~/.aws/config
Create an image archive
-
Upload the images with text you want to recognize to the bucket.
Tip
Need an example? download an image
of the penguin crossing road sign. -
Make sure the images were uploaded by sending the following request with your bucket name specified:
aws --endpoint-url=https://storage.yandexcloud.net s3 ls s3://<bucket_name>/ -
Save the images from the bucket to the
my_picturesdirectory on your VM:aws --endpoint-url=https://storage.yandexcloud.net s3 cp s3://<bucket_name>/ my_pictures --recursive -
Add the images to the
my_pictures.tararchive:tar -cf my_pictures.tar my_pictures/* -
Delete the image directory:
rm -rfd my_pictures
Prepare a script for image recognition and uploading
Set up your environment
-
Install the
epelrepository for additional packages:sudo yum install epel-release -y -
Install the
jqpackage for processing results from Vision OCR:sudo yum install jq -y -
Install the
nanotext editor:sudo yum install nano -y -
Set the environment variables used by the script:
export BUCKETNAME="<bucket_name>" export FOLDERID="<folder_ID>" export IAMTOKEN="<IAM_token>"Where:
BUCKETNAME: Bucket name.FOLDERID: Folder ID.IAMTOKEN: IAM token that you received when configuring the service account.
Create a script
The script runs through the following steps:
- Creating required directories.
- Extracting the image archive.
- Processing all images in sequence:
- Encoding the image in Base64.
- Creating a request body containing the image.
- Sending the image to Vision OCR for recognition via a POST request.
- Saving the response to the
output.jsonfile. - Extracting the recognized text from
output.jsonand saving it to a text file.
- Adding the resulting text files to an archive.
- Uploading the archive to Object Storage.
- Deleting temporary files.
To make things clearer, the script has comments for each step.
To prepare this script:
-
Create a file named
vision.shand open it innano:sudo nano vision.sh -
Paste the following Bash script into
vision.sh:#!/bin/bash # Create the required directories. echo "Creating directories..." # Create a directory to store the recognized text. mkdir my_pictures_text # Extract the image archive into the new directory. echo "Extract pictures in my_pictures directory..." tar -xf my_pictures.tar # Run text recognition on the images from the archive. FILES=my_pictures/* for f in $FILES # Loop through each file in the directory and perform the following steps: do # Encode the image in Base64 for submission to Vision OCR. CODEIMG=$(base64 -i $f | cat) # Create a `body.json` file that you will send to Vision OCR via a POST request. cat <<EOF > body.json { "mimeType": "JPEG", "languageCodes": ["*"], "model": "page", "content": "$CODEIMG" } EOF # Send the image to Vision OCR for recognition and write the response to the `output.json` file. echo "Processing file $f in Vision..." curl --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAMTOKEN}" \ --header "x-data-logging-enabled: true" \ --header "x-folder-id: ${FOLDERID}" \ --data '@body.json' \ https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText \ --output output.json # Get the image file name to use later in the request. IMAGE_BASE_NAME=$(basename -- "$f") IMAGE_NAME="${IMAGE_BASE_NAME%.*}" # Retrieve the text data from `output.json` and write it to a .txt file named after the original image file. cat output.json | jq -r '.result[].blocks[].lines[].text' | awk -v ORS=" " '{print}' > my_pictures_text/$IMAGE_NAME".txt" done # Archive the contents of the text file directory. echo "Packing text files to archive..." tar -cf my_pictures_text.tar my_pictures_text # Upload the text file archive to your bucket. echo "Sending archive to Object Storage Bucket..." aws --endpoint-url=https://storage.yandexcloud.net s3 cp my_pictures_text.tar s3://$BUCKETNAME/ > /dev/null # Delete temporary files. echo "Cleaning up..." rm -f body.json rm -f output.json rm -rfd my_pictures rm -rfd my_pictures_text rm -r my_pictures_text.tar -
Set execute permissions for the script:
sudo chmod 755 vision.sh -
Run the script:
./vision.sh
Verify the recognition accuracy
- In the Yandex Cloud management console
, select the folder containing the bucket with your recognition results. - Go to Object Storage.
- Open the bucket with the recognition results.
- Verify that the bucket contains
my_pictures_text.tar. - Download and unpack the archive.
- Verify that the text in the
<image_name>.txtfiles matches the text on the corresponding images.
How to delete the resources you created
To stop incurring costs for the resources you created: