Regular recognition of images and PDF documents from an Object Storage bucket
In this tutorial, you will use Yandex Vision OCR to set up automatic recognition of supported image formats and PDF documents regularly uploaded to a Yandex Object Storage bucket.
Recognition process
- The user uploads images or documents to the
inputdirectory (prefix) in a Yandex Object Storage bucket. - The Yandex Cloud Functions trigger is launched by a timer and checks for files in the
inputfolder. Next, the files are sent for recognition to the Yandex Serverless Containers container. - The file recognition process is in progress, the operation ID is saved in the
processfolder in the source bucket. - After the operation is successfully completed, the recognition results are saved in the
resultfolder as JSON and TXT files. The ID of the successful operation is deleted from theprocessfolder.
The infrastructure is created with the help of the Yandex Cloud Terraform provider. For the source code discussed in the tutorial, visit GitHub
To set up automatic image recognition using Vision OCR:
- Get your cloud ready.
- Create your infrastructure.
- Upload the files for recognition and test Vision OCR.
If you no longer need the resources you created, delete them.
Get your cloud ready
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can navigate to the cloud page
Learn more about clouds and folders here.
Required paid resources
The cost of infrastructure support for regular image and document recognition includes:
- Fee for bucket data storage and data operations (see Object Storage pricing).
- Fee for using Vision OCR (see Vision OCR pricing).
- Fee for the number of container invocations, computing resources allocated for the application, and outbound traffic (see Serverless Containers pricing).
- Fee for secret storage and operations (see Yandex Lockbox pricing).
Create your infrastructure
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
To create an infrastructure using Terraform:
-
Install Terraform, get the credentials, and specify the source for installing Yandex Cloud (see Configure your provider, step 1).
-
Prepare your infrastructure description files:
-
Clone the repository with configuration files.
git clone https://github.com/yandex-cloud-examples/yc-vision-ocr-recognizer.git -
Go to the
terraformdirectory inside the repository. -
In the
variables.auto.tfvarsfile, set the following user-defined properties:cloud_id: Cloud IDfolder_id: Folder IDzone: Availability zone
-
-
Create the resources:
-
In the terminal, go to the directory where you edited the configuration file.
-
Make sure the configuration file is correct using this command:
terraform validateIf the configuration is correct, you will get this message:
Success! The configuration is valid. -
Run this command:
terraform planYou will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.
-
Apply the changes:
terraform apply -
Type
yesand press Enter to confirm the changes.
-
A bucket with a name in ocr-recognition-... format will be created.
Upload the files for recognition and test Vision OCR
- Upload the files for recognition to the
inputfolder inside the bucket you created earlier. - Open the
resultfolder in the bucket. The folder should contain the recognition results in the form of .txt and .json files.
How to delete the resources you created
To stop paying for the resources you created:
-
Delete the files from the bucket.
-
Open the
main.tffile and delete your infrastructure description from it. -
Apply the changes:
-
In the terminal, go to the directory where you edited the configuration file.
-
Make sure the configuration file is correct using this command:
terraform validateIf the configuration is correct, you will get this message:
Success! The configuration is valid. -
Run this command:
terraform planYou will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.
-
Apply the changes:
terraform apply -
Type
yesand press Enter to confirm the changes.
-