Regular recognition of images and PDF documents from an Object Storage bucket
In this tutorial, you will use Yandex Vision OCR
Recognition process
- The user uploads images or documents to the
inputdirectory (prefix) in a Yandex Object Storage bucket. - The Yandex Cloud Functions trigger is launched by a timer and checks for files in the
inputfolder. Next, the files are sent for recognition to the Yandex Serverless Containers container. - The file recognition process is in progress, the operation ID is saved in the
processfolder in the source bucket. - After the operation is successfully completed, the recognition results are saved in the
resultfolder as JSON and TXT files. The ID of the successful operation is deleted from theprocessfolder.
The infrastructure is created with the help of the Yandex Cloud Terraform provider. For the source code discussed in the tutorial, visit GitHub
To set up automatic image recognition using Vision OCR:
- Get your cloud ready.
- Create your infrastructure.
- Upload the files for recognition and test Vision OCR.
If you no longer need the resources you created, delete them.
Get your cloud ready
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page
Learn more about clouds and folders here.
Required paid resources
The cost of infrastructure support for regular image and document recognition includes:
- Fee for data storage in a bucket and data operations (see Object Storage pricing).
- Fee for using Vision OCR (see Vision OCR pricing
). - Fee for container invocation count, computing resources allocated to run the application, and outbound traffic (see Serverless Containers pricing).
- Fee for storing the secret and operations with it (see Yandex Lockbox pricing).
Create your infrastructure
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
To create an infrastructure using Terraform:
-
Install Terraform, get the authentication credentials, and specify the source for installing the Yandex Cloud provider (see Configure your provider, Step 1).
-
Set up your infrastructure description files:
-
Clone the repository with configuration files.
git clone https://github.com/yandex-cloud-examples/yc-vision-ocr-recognizer.git -
Go to the
terraformdirectory inside the repository. -
In the
variables.auto.tfvarsfile, set the following user-defined properties:cloud_id: Cloud ID.folder_id: Folder ID.zone: Availability zone.
-
-
Create the resources:
-
In the terminal, go to the directory where you edited the configuration file.
-
Make sure the configuration file is correct using this command:
terraform validateIf the configuration is correct, you will get this message:
Success! The configuration is valid. -
Run this command:
terraform planYou will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.
-
Apply the changes:
terraform apply -
Type
yesand press Enter to confirm the changes.
-
A bucket with a name in ocr-recognition-... format will be created.
Upload the files for recognition and test Vision OCR
- Upload the files for recognition to the
inputfolder inside the bucket you created earlier. - Open the
resultfolder in the bucket. The folder should contain the recognition results in the form of .txt and .json files.
How to delete the resources you created
To stop paying for the resources you created:
-
Delete the files from the bucket.
-
Open the
main.tfconfiguration file and delete your infrastructure description from it. -
Apply the changes:
-
In the terminal, go to the directory where you edited the configuration file.
-
Make sure the configuration file is correct using this command:
terraform validateIf the configuration is correct, you will get this message:
Success! The configuration is valid. -
Run this command:
terraform planYou will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.
-
Apply the changes:
terraform apply -
Type
yesand press Enter to confirm the changes.
-