Regular recognition of images and PDF documents from an Object Storage bucket

Written by

Updated at August 14, 2025

Recognition process
Get your cloud ready
- Required paid resources
Create your infrastructure
Upload the files for recognition and test Vision OCR
How to delete the resources you created

In this tutorial, you will use Yandex Vision OCR to set up automatic recognition of supported image formats and PDF documents regularly uploaded to a Yandex Object Storage bucket.

Recognition process

The user uploads images or documents to the input directory (prefix) in a Yandex Object Storage bucket.
The Yandex Cloud Functions trigger is launched by a timer and checks for files in the input folder. Next, the files are sent for recognition to the Yandex Serverless Containers container.
The file recognition process is in progress, the operation ID is saved in the process folder in the source bucket.
After the operation is successfully completed, the recognition results are saved in the result folder as JSON and TXT files. The ID of the successful operation is deleted from the process folder.

The infrastructure is created with the help of the Yandex Cloud Terraform provider. For the source code discussed in the tutorial, visit GitHub.

To set up automatic image recognition using Vision OCR:

If you no longer need the resources you created, delete them.

Get your cloud ready

Navigate to the management console and log in to Yandex Cloud or create a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure.

Learn more about clouds and folders here.

Required paid resources

The cost of infrastructure support for regular image and document recognition includes:

Fee for bucket data storage and data operations (see Object Storage pricing).
Fee for using Vision OCR (see Vision OCR pricing).
Fee for the number of container invocations, computing resources allocated for the application, and outbound traffic (see Serverless Containers pricing).
Fee for secret storage and operations (see Yandex Lockbox pricing).

With Terraform, you can quickly create a cloud infrastructure in Yandex Cloud and manage it using configuration files. These files store the infrastructure description written in HashiCorp Configuration Language (HCL). If you change the configuration files, Terraform automatically detects which part of your configuration is already deployed, and what should be added or removed.

Terraform is distributed under the Business Source License. The Yandex Cloud provider for Terraform is distributed under the MPL-2.0 license.

For more information about the provider resources, see the relevant documentation on the Terraform website or its mirror.

To create an infrastructure using Terraform:

Install Terraform, get the credentials, and specify the source for installing Yandex Cloud (see Configure your provider, step 1).
Prepare your infrastructure description files:
1. Clone the repository with configuration files.
```
git clone https://github.com/yandex-cloud-examples/yc-vision-ocr-recognizer.git
```
2. Go to the terraform directory inside the repository.
3. In the variables.auto.tfvars file, set the following user-defined properties:
  - cloud_id: Cloud ID
  - folder_id: Folder ID
  - zone: Availability zone
Create the resources:
1. In the terminal, go to the directory where you edited the configuration file.
2. Make sure the configuration file is correct using this command:
```
terraform validate
```
  If the configuration is correct, you will get this message:
```
Success! The configuration is valid.
```
3. Run this command:
```
terraform plan
```
  You will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.
4. Apply the changes:
```
terraform apply
```
5. Type yes and press Enter to confirm the changes.

A bucket with a name in ocr-recognition-... format will be created.

Upload the files for recognition and test Vision OCR

Upload the files for recognition to the input folder inside the bucket you created earlier.
Open the result folder in the bucket. The folder should contain the recognition results in the form of .txt and .json files.

How to delete the resources you created

To stop paying for the resources you created:

Delete the files from the bucket.
Open the main.tf file and delete your infrastructure description from it.
Apply the changes:
1. In the terminal, go to the directory where you edited the configuration file.
2. Make sure the configuration file is correct using this command:
```
terraform validate
```
  If the configuration is correct, you will get this message:
```
Success! The configuration is valid.
```
3. Run this command:
```
terraform plan
```
  You will see a detailed list of resources. No changes will be made at this step. If the configuration contains any errors, Terraform will show them.
4. Apply the changes:
```
terraform apply
```
5. Type yes and press Enter to confirm the changes.

Regular recognition of images and PDF documents from an Object Storage bucket

Recognition processRecognition process

Get your cloud readyGet your cloud ready

Required paid resourcesRequired paid resources

Create your infrastructureCreate your infrastructure

Upload the files for recognition and test Vision OCRUpload the files for recognition and test Vision OCR

How to delete the resources you createdHow to delete the resources you created

Was the article helpful?

Recognition process

Get your cloud ready

Required paid resources

Create your infrastructure

Upload the files for recognition and test Vision OCR

How to delete the resources you created