Scheduled image and PDF recognition in an Object Storage bucket
In this tutorial, you will learn how to use Yandex Vision OCR
Recognition steps
- The user uploads images or documents to the
inputdirectory (prefix) in a Yandex Object Storage bucket. - The Yandex Cloud Functions trigger, activated on schedule, checks for new files in the
inputfolder. Next, the system sends files to the Yandex Serverless Containers container for recognition. - During the recognition process, the operation ID is stored in the
processfolder of the source bucket. - Once the operation is completed, the recognition results are saved to the
resultfolder as JSON and TXT files. The operation ID is deleted from theprocessfolder.
The infrastructure is built using the Yandex Cloud Terraform provider. The source code for this guide is available on GitHub
To set up automatic image recognition via Vision OCR:
- Get your cloud ready.
- Create the infrastructure.
- Upload files for recognition and check how the Vision OCR service works.
If you no longer need the resources you created, delete them.
Get your cloud ready
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page
Learn more about clouds and folders here.
Required paid resources
The infrastructure support cost for scheduled image and document recognition includes:
- Fee for data storage in a bucket and data operations (see Object Storage pricing).
- Fee for using Vision OCR (see Vision OCR pricing
). - Fee for container invocation count, computing resources allocated to run the application, and outbound traffic (see Serverless Containers pricing).
- Fee for storing the secret and operations with it (see Yandex Lockbox pricing).
Create the infrastructure
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
To create your infrastructure via Terraform:
-
Install Terraform, get the authentication credentials, and specify the source for installing the Yandex Cloud provider (see Configure your provider, Step 1).
-
Prepare your infrastructure description files:
-
Clone the repository containing the configuration files.
git clone https://github.com/yandex-cloud-examples/yc-vision-ocr-recognizer.git -
Navigate to the
terraformdirectory within the repository. -
In the
variables.auto.tfvarsfile, specify these custom settings:cloud_id: Cloud ID.folder_id: Folder ID.zone: Availability zone.
-
-
Create the resources:
-
In the terminal, navigate to the configuration file directory.
-
Make sure the configuration is correct using this command:
terraform validateIf the configuration is valid, you will get this message:
Success! The configuration is valid. -
Run this command:
terraform planYou will see a list of resources and their properties. No changes will be made at this step. Terraform will show any errors in the configuration.
-
Apply the configuration changes:
terraform apply -
Type
yesand press Enter to confirm the changes.
-
The system will create a bucket named ocr-recognition-....
Upload files for recognition and check how the Vision OCR service works
- Upload the files for recognition to the
inputfolder inside the bucket you created earlier. - Open the
resultfolder in your bucket. The folder should contain the recognition results as TXT and JSON files.
How to delete the resources you created
To stop incurring charges for the resources you created:
-
Delete all files from the bucket.
-
Open the
main.tfconfiguration file and delete your infrastructure description from it. -
Apply the changes:
-
In the terminal, navigate to the configuration file directory.
-
Make sure the configuration is correct using this command:
terraform validateIf the configuration is valid, you will get this message:
Success! The configuration is valid. -
Run this command:
terraform planYou will see a list of resources and their properties. No changes will be made at this step. Terraform will show any errors in the configuration.
-
Apply the configuration changes:
terraform apply -
Type
yesand press Enter to confirm the changes.
-