Performing text search queries in deferred mode

Written by

Improved by

Updated at December 23, 2025

Getting started
- Get your cloud ready
Send a search query

With Yandex Search API, you can perform text search through the Yandex search database and get search results in XML or HTML format in deferred (asynchronous) mode. You can run queries using the Yandex Cloud ML SDK, REST API, and gRPC API. The search results you get depend on the parameters specified in your query.

Getting started

Navigate to the management console and log in to Yandex Cloud or create a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure.

Learn more about clouds and folders here.

Get your cloud ready

To use the examples:

SDK

REST API

gRPC API

Create a service account and assign the search-api.webSearch.user role to it.
Get and save the service account's API key with yc.search-api.execute for its scope.

The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.

Note

If you are using Windows, we recommend installing the WSL shell first and using it to proceed.
Install Python 3.10 or higher.
Install Python venv to create isolated virtual environments in Python.
Create a new Python virtual environment and activate it:
```
python3 -m venv new-env
source new-env/bin/activate
```
Use the pip package manager to install the ML SDK library:
```
pip install yandex-cloud-ml-sdk
```

Create a service account you will use to send requests. You can also use a Yandex account or a federated account, but a service account is a better choice for automation purposes.
Assign the search-api.webSearch.user role to the account you will use to send requests.
Get an IAM token, which is required for authentication.

The following examples use IAM token authentication. To use a service account's API key for authentication, edit the Authorization header in the query examples. For more information, see API authentication.

To use the examples, you should additionally install cURL and jq.

Create a service account you will use to send requests. You can also use a Yandex account or a federated account, but a service account is a better choice for automation purposes.
Assign the search-api.webSearch.user role to the account you will use to send requests.
Get an IAM token, which is required for authentication.

The following examples use IAM token authentication. To use a service account's API key for authentication, edit the Authorization header in the query examples. For more information, see API authentication.

To use the examples, you should additionally install gRPCurl and jq.

Send a search query

To run a search query:

SDK

REST API

gRPC API

Create a file named web-search-async.py and paste the following code into it:

#!/usr/bin/env python3

from __future__ import annotations

from typing import Literal, cast

from yandex_cloud_ml_sdk import YCloudML

from yandex_cloud_ml_sdk.search_api import (
    FamilyMode,
    FixTypoMode,
    GroupMode,
    Localization,
    SearchType,
    SortMode,
    SortOrder,
)

import pathlib

USER_AGENT = "Mozilla/5.0 (Linux; Android 13; Pixel 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.112 Mobile Safari/537.36"


def main() -> None:

    sdk = YCloudML(
        folder_id="<folder_ID>",
        auth="<API_key>",
    )
    sdk.setup_default_logging("error")

    # you could pass any settings when creating the Search object
    search = sdk.search_api.web(
        "RU",
        family_mode=FamilyMode.MODERATE,
        # By default object configuration property values are set to None,
        # which corresponds to the "default" value which is
        # defined at the service's backend.
        # e.g. docs_in_group=None,
    )

    # but also you could reconfigure the Search object at any time:
    search = search.configure(
        # These are enum-type settings,
        # they could be passed as strings as shown below.
        search_type="ru",
        family_mode="strict",
        fix_typo_mode="off",
        group_mode="deep",
        localization="ru",
        sort_order="desc",
        sort_mode="by_time",
        docs_in_group=None,
        groups_on_page=6,
        max_passages=2,
        region="225",
        user_agent=USER_AGENT,
    )

    search_query = input("Enter the search query: ")
    if not search_query.strip():
        search_query = "Yandex Cloud"

    format_ = input("Choose format ([xml]/html): ")
    format_ = format_.strip() or "xml"
    assert format_.lower() in ("xml", "html")
    format_ = cast(Literal["html", "xml"], format_)

    for page in range(0, 10):
        operation = search.run_deferred(search_query, format=format_, page=page)
        search_result = operation.wait(poll_interval=1)
        output_filename = (
            str(pathlib.Path(__file__).parent)
            + "/"
            + "page_"
            + str(page + 1)
            + "."
            + format_
        )
        file = open(output_filename, "a")
        file.write(search_result.decode("utf-8"))
        print(f"Page {page} saved to file {output_filename}")
        file.close()


if __name__ == "__main__":
    main()

Where:

<folder_ID>: ID of the folder in which the service account was created.
<API_key>: Service account API key you got earlier required for authentication in the API.

The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.

You can set the search parameters for a sdk.search_api.web class object in that object’s properties or the .configure method properties:

Description of object properties

search_type: Search type. The possible values are:
- ru: For the Russian search type.
- tr: For the Turkish search type.
- com: For the International search type.
- kk: For the Kazakh search type.
- be: For the Belarusian search type.
- uz: For the Uzbek search type.
family_mode: Results filtering. This is an optional parameter. The possible values are:
- moderate: Moderate filter (default). Adult category documents are excluded from search results unless the query explicitly targets resources of this category.
- none: Filtering is off. Search results include any documents regardless of their contents.
- strict: Family filter. Regardless of the search query, Adult category documents and documents containing profanity are excluded from search results.
fix_typo_mode: Search query typo correction setting. This is an optional parameter. The possible values are:
- on: Typo correction enabled (default). Search query typos are corrected automatically.
- off: Typo correction disabled. Search query typos are not corrected. The search is performed strictly as per the query.
group_mode: Result grouping method. This is an optional parameter. The possible values are:
- deep: Grouping by domain. Each group contains documents from one domain (default).
- flat: Flat grouping. Each group contains a single document.
localization: Search response notifications language. Affects the text in the found-docs-human tag and error messages. This is an optional parameter. Possible values depend on the selected search type:
- Russian:
  - ru: Russian (default).
  - be: Belarusian.
  - kk: Kazakh.
  - uk: Ukrainian.
- Turkish:
  - tr: Turkish.
- International:
  - en: English.
sort_order: Search results sorting order. This is an optional parameter. The possible values are:
- desc: Forward sorting order from most recent to oldest (default).
- asc: Reverse sorting order from oldest to most recent.
sort_mode: Search results sorting mode rule. This is an optional parameter. The possible values are:
- by_relevance: Sorting by relevance (default).
- by_time: Sorting by document update time.
docs_in_group: Maximum number of documents that can be returned per group. This is an optional parameter. The valid values range from 1 to 3. The default value is 1.
groups_on_page: Maximum number of groups that can be returned per page. This is an optional parameter. The default value is 20.

When getting the result in XML format, the possible values range from 1 to 100; for HTML, the range is from 5 to 50.
max_passages: Maximum number of passages that can be used when generating a document snippet. This is an optional parameter. The valid values range from 1 to 5. By default, a maximum of four passages with search query text is returned per document.
region: Search country or region ID that affects the document ranking rules. Only supported for the Russian and Turkish search types.

For a list of frequently used country and region IDs, see Search regions.
user_agent: String containing the User-Agent header. Use this parameter to have your search results optimized for a specific device and browser, including mobile search results. This is an optional parameter. If not specified, you will get the default output.

List of supported parameters:

Note

The list of supported request parameters depends on the required output format, XML or HTML.

Parameter	Supported in XML response	Supported in HTML response
`search_type`
`family_mode`
`fix_typo_mode`
`group_mode`
`localization`
`sort_order`
`sort_mode`
`docs_in_group`
`groups_on_page`
`max_passages`
`region`
`user_agent`

Run the file you created:
```
python3 web-search-async.py
```
During execution, the code will ask you for the text of your search query and the format you want to get your response in.

As a result, the code will save in the current directory the first ten pages of search results for the query in the format you specify:
```
Page 0 saved to file /Users/MyUser/Desktop/page_1.xml
...
Page 9 saved to file /Users/MyUser/Desktop/page_10.xml
```

Create a search query:

Create a file with the request body, e.g., body.json:

body.json

{
    "query": {
      "searchType": "<search_type>",
      "queryText": "<search_query_text>",
      "familyMode": "<result_filter_setting_value>",
      "page": "<page_number>",
      "fixTypoMode": "<typo_correction_mode_setting_value>"
    },
    "sortSpec": {
      "sortMode": "<result_sorting_rule>",
      "sortOrder": "<sort_order_of_results>"
    },
    "groupSpec": {
      "groupMode": "<result_grouping_method>",
      "groupsOnPage": "<number_of_groups_per_page>",
      "docsInGroup": "<number_of_documents_per_group>"
    },
    "maxPassages": "<maximum_number_of_passages>",
    "region": "<region_ID>",
    "l10N": "<notification_language>",
    "folderId": "<folder_ID>",
    "responseFormat": "<result_format>",
    "userAgent": "<User-Agent_header>"
}

Description of fields

searchType: Search type. The possible values are:
- SEARCH_TYPE_RU: For the Russian search type.
- SEARCH_TYPE_TR: For the Turkish search type.
- SEARCH_TYPE_COM: For the International search type.
- SEARCH_TYPE_KK: For the Kazakh search type.
- SEARCH_TYPE_BE: For the Belarusian search type.
- SEARCH_TYPE_UZ: For the Uzbek search type.
queryText: Search query text. The maximum length is 400 characters.
familyMode: Results filtering. This is an optional parameter. The possible values are:
- FAMILY_MODE_MODERATE: Moderate filter (default). Adult category documents are excluded from search results unless the query explicitly targets resources of this category.
- FAMILY_MODE_NONE: Filtering is off. Search results include any documents regardless of their contents.
- FAMILY_MODE_STRICT: Family filter. Regardless of the search query, Adult category documents and documents containing profanity are excluded from search results.
page: Requested page number. This is an optional parameter. By default, the first page with search results is returned. Page numbering starts from zero (0 stands for page one).
fixTypoMode: Search query typo correction setting. This is an optional parameter. The possible values are:
- FIX_TYPO_MODE_ON: Typo correction enabled (default). Search query typos are corrected automatically.
- FIX_TYPO_MODE_OFF: Typo correction disabled. Search query typos are not corrected. The search is performed strictly as per the query.

sortMode: Search results sorting mode rule. This is an optional parameter. The possible values are:
- SORT_MODE_BY_RELEVANCE: Sorting by relevance (default).
- SORT_MODE_BY_TIME: Sorting by document update time.
sortOrder: Search results sorting order. This is an optional parameter. The possible values are:
- SORT_ORDER_DESC: Forward sorting order from most recent to oldest (default).
- SORT_ORDER_ASC: Reverse sorting order from oldest to most recent.
groupMode: Result grouping method. This is an optional parameter. The possible values are:
- GROUP_MODE_DEEP: Grouping by domain. Each group contains documents from one domain (default).
- GROUP_MODE_FLAT: Flat grouping. Each group contains a single document.
groupsOnPage: Maximum number of groups that can be returned per page. This is an optional parameter. The default value is 20.

When getting the result in XML format, the possible values range from 1 to 100; for HTML, the range is from 5 to 50.
docsInGroup: Maximum number of documents that can be returned per group. This is an optional parameter. The values range from 1 to 3. The default value is 1.
maxPassages: Maximum number of passages that can be used when generating a document snippet. This is an optional parameter. The values range from 1 to 5. By default, a maximum of four passages with search query text is returned per document.
region: Search country or region ID that affects the document ranking rules. Only supported for the Russian and Turkish search types.

For a list of frequently used country and region IDs, see Search regions.
l10N: Search response notifications language. Affects the text in the found-docs-human tag and error messages. This is an optional parameter. Possible values depend on the selected search type:
- Russian:
  - LOCALIZATION_RU: Russian (default).
  - LOCALIZATION_BE: Belarusian.
  - LOCALIZATION_KK: Kazakh.
  - LOCALIZATION_UK: Ukrainian.
- Turkish:
  - LOCALIZATION_TR: Turkish.
- International:
  - LOCALIZATION_EN: English.
folderId: Folder ID of the user or service account you will use for queries.
responseFormat: Search results format. This is an optional parameter. The possible values are:
- FORMAT_XML: The results will be delivered in XML format (default).
- FORMAT_HTML: The results will be delivered in HTML format.
userAgent: String containing the User-Agent header. Use this parameter to have your search results optimized for a specific device and browser, including mobile search results. This is an optional parameter. If not specified, you will get the default output.

List of supported parameters:

Note

The list of supported request parameters depends on the required output format, XML or HTML.

Parameter	Supported in XML response	Supported in HTML response
`searchType`
`queryText`
`familyMode`
`page`
`fixTypoMode`
`sortMode`
`sortOrder`
`groupMode`
`groupsOnPage`
`docsInGroup`
`maxPassages`
`region`
`l10N`
`folderId`
`responseFormat`
`userAgent`

Run an http request by specifying the IAM token you got earlier:

curl \
  --request POST \
  --header "Authorization: Bearer <IAM_token>" \
  --data "@body.json" \
  "https://searchapi.api.cloud.yandex.net/v2/web/searchAsync"

Result:

{
"done": false,
"id": "sppger465oq1********",
"description": "WEB search async",
"createdAt": "2024-10-02T19:51:02Z",
"createdBy": "bfbud0oddqp4********",
"modifiedAt": "2024-10-02T19:51:03Z"
}

Save the obtained Operation object ID (id value) for later use.

Wait until Yandex Search API executes the request and generates a response. This may take from five minutes to a few hours.

To make sure your request was successful, run this HTTP request:
```
curl \
  --request GET \
  --header "Authorization: Bearer <IAM_token>" \
  https://operation.api.cloud.yandex.net/operations/<request_ID>
```
Where:
- <IAM_token>: Previously obtained IAM token.
- <request_ID>: The Operation object ID you saved at the previous step.
Result:
```
{
"done": true,
"response": {
  "@type": "type.googleapis.com/yandex.cloud.searchapi.v2.WebSearchResponse",
  "rawData": "<Base64_encoded_response_body>"
},
"id": "spp82pc07ebl********",
"description": "WEB search async",
"createdAt": "2024-10-03T08:07:07Z",
"createdBy": "bfbud0oddqp4********",
"modifiedAt": "2024-10-03T08:12:09Z"
}
```
If the done field is set to true and the response object is present in the output, the request has been completed, so you can move on to the next step. Otherwise, repeat the check after some time.
After Yandex Search API has successfully processed the request, get the response:
1. Get the result:
```
curl \
  --request GET \
  --header "Authorization: Bearer <IAM_token>" \
  https://operation.api.cloud.yandex.net/operations/<request_ID> \
  > result.json
```
  Eventually the search query result will be saved to a file named result.json containing a Base64-encoded XML or HTML response in the response.rawData field.
2. Depending on the requested response format, decode the result from Base64:
  XML
  
  HTML
  echo "$(< result.json)" | \ jq -r .response.rawData | \ base64 --decode > result.xml
  The XML response to the query will be saved to a file named result.xml.
  echo "$(< result.json)" | \ jq -r .response.rawData | \ base64 --decode > result.html
  The HTML response to the query will be saved to a file named result.html.

Create a search query:

Create a file with the request body, e.g., body.json:

body.json

{
    "query": {
      "search_type": "<search_type>",
      "query_text": "<search_query_text>",
      "family_mode": "<result_filter_setting_value>",
      "page": "<page_number>",
      "fix_typo_mode": "<typo_correction_mode_setting_value>"
    },
    "sort_spec": {
      "sort_mode": "<result_sorting_rule>",
      "sort_order": "<sort_order_of_results>"
    },
    "group_spec": {
      "group_mode": "<result_grouping_method>",
      "groups_on_page": "<number_of_groups_per_page>",
      "docs_in_group": "<number_of_documents_per_group>"
    },
    "max_passages": "<maximum_number_of_passages>",
    "region": "<region_ID>",
    "l10n": "<notification_language>",
    "folder_id": "<folder_ID>",
    "response_format": "<result_format>",
    "user_agent": "<User-Agent_header>"
}

Description of fields

search_type: Search type. The possible values are:
- SEARCH_TYPE_RU: For the Russian search type.
- SEARCH_TYPE_TR: For the Turkish search type.
- SEARCH_TYPE_COM: For the International search type.
- SEARCH_TYPE_KK: For the Kazakh search type.
- SEARCH_TYPE_BE: For the Belarusian search type.
- SEARCH_TYPE_UZ: For the Uzbek search type.
query_text: Search query text. The maximum length is 400 characters.
family_mode: Results filtering. This is an optional parameter. The possible values are:
- FAMILY_MODE_MODERATE: Moderate filter (default). Adult category documents are excluded from search results unless the query explicitly targets resources of this category.
- FAMILY_MODE_NONE: Filtering is off. Search results include any documents regardless of their contents.
- FAMILY_MODE_STRICT: Family filter. Regardless of the search query, Adult category documents and documents containing profanity are excluded from search results.
page: Requested page number. This is an optional parameter. By default, the first page with search results is returned. Page numbering starts from zero (0 stands for page one).
fix_typo_mode: Search query typo correction setting. This is an optional parameter. The possible values are:
- FIX_TYPO_MODE_ON: Typo correction enabled (default). Search query typos are corrected automatically.
- FIX_TYPO_MODE_OFF: Typo correction disabled. Search query typos are not corrected. The search is performed strictly as per the query.

sort_mode: Search results sorting mode rule. This is an optional parameter. The possible values are:
- SORT_MODE_BY_RELEVANCE: Sorting by relevance (default).
- SORT_MODE_BY_TIME: Sorting by document update time.
sort_order: Search results sorting order. This is an optional parameter. The possible values are:
- SORT_ORDER_DESC: Forward sorting order from most recent to oldest (default).
- SORT_ORDER_ASC: Reverse sorting order from oldest to most recent.
group_mode: Result grouping method. This is an optional parameter. The possible values are:
- GROUP_MODE_DEEP: Grouping by domain. Each group contains documents from one domain (default).
- GROUP_MODE_FLAT: Flat grouping. Each group contains a single document.
groups_on_page: Maximum number of groups that can be returned per page. This is an optional parameter. The default value is 20.

When getting the result in XML format, the possible values range from 1 to 100, while for HTML format, from 5 to 50.
docs_in_group: Maximum number of documents that can be returned per group. This is an optional parameter. The values range from 1 to 3. The default value is 1.
max_passages: Maximum number of passages that can be used when generating a document snippet. This is an optional parameter. The values range from 1 to 5. By default, a maximum of four passages with search query text is returned per document.
region: Search country or region ID that affects the document ranking rules. Only supported for the Russian and Turkish search types.

For a list of frequently used country and region IDs, see Search regions.
l10n: Search response notifications language. Affects the text in the found-docs-human tag and error messages. This is an optional parameter. Possible values depend on the selected search type:
- Russian:
  - LOCALIZATION_RU: Russian (default).
  - LOCALIZATION_BE: Belarusian.
  - LOCALIZATION_KK: Kazakh.
  - LOCALIZATION_UK: Ukrainian.
- Turkish:
  - LOCALIZATION_TR: Turkish.
- International:
  - LOCALIZATION_EN: English.
folder_id: Folder ID of the user or service account you will use for queries.
response_format: Search results format. This is an optional parameter. The possible values are:
- FORMAT_XML: The results will be delivered in XML format (default).
- FORMAT_HTML: The results will be delivered in HTML format.
user_agent: String containing the User-Agent header. Use this parameter to have your search results optimized for a specific device and browser, including mobile search results. This is an optional parameter. If not specified, you will get the default output.

List of supported parameters:

Note

The list of supported request parameters depends on the required output format, XML or HTML.

Parameter	Supported in XML response	Supported in HTML response
`search_type`
`query_text`
`family_mode`
`page`
`fix_typo_mode`
`sort_mode`
`sort_order`
`group_mode`
`groups_on_page`
`docs_in_group`
`max_passages`
`region`
`l10n`
`folder_id`
`response_format`
`user_agent`

Run a gRPC call by specifying the IAM token you got earlier:

grpcurl \
  -rpc-header "Authorization: Bearer <IAM_token>" \
  -d @ < body.json \
  searchapi.api.cloud.yandex.net:443 yandex.cloud.searchapi.v2.WebSearchAsyncService/Search

Result:

{
  "id": "spp3gp3vhna6********",
  "description": "WEB search async",
  "createdAt": "2024-10-02T19:14:41Z",
  "createdBy": "bfbud0oddqp4********",
  "modifiedAt": "2024-10-02T19:14:42Z"
}

Save the obtained Operation object ID (id value) for later use.

Wait until Yandex Search API executes the request and generates a response. This may take from five minutes to a few hours.

To make sure the request was successful, run this gRPC call:

grpcurl \
  -rpc-header "Authorization: Bearer <IAM_token>" \
  -d '{"operation_id": "<query_ID>"}' \
  operation.api.cloud.yandex.net:443 yandex.cloud.operation.OperationService/Get

Where:

<IAM_token>: Previously obtained IAM token.
<request_ID>: The Operation object ID you saved at the previous step.

Result:

{
  "id": "spp82pc07ebl********",
  "description": "WEB search async",
  "createdAt": "2024-10-03T08:07:07Z",
  "createdBy": "bfbud0oddqp4********",
  "modifiedAt": "2024-10-03T08:12:09Z",
  "done": true,
  "response": {
    "@type": "type.googleapis.com/yandex.cloud.searchapi.v2.WebSearchResponse",
    "rawData": "<Base64_encoded_response_body>"
  }
}

If the done field is set to true and the response object is present in the output, the request has been completed, so you can move on to the next step. Otherwise, repeat the check after some time.

After Yandex Search API has successfully processed the request, get the response:
1. Get the result:
```
grpcurl \
  -rpc-header "Authorization: Bearer <IAM_token>" \
  -d '{"operation_id": "<request_ID>"}' \
  operation.api.cloud.yandex.net:443 yandex.cloud.operation.OperationService/Get \
  > result.json
```
  Eventually the search query result will be saved to a file named result.json containing a Base64-encoded XML or HTML response in the response.rawData field.
2. Depending on the requested response format, decode the result from Base64:
  XML
  
  HTML
  echo "$(< result.json)" | \ jq -r .response.rawData | \ base64 --decode > result.xml
  The XML response to the query will be saved to a file named result.xml.
  echo "$(< result.json)" | \ jq -r .response.rawData | \ base64 --decode > result.html
  The HTML response to the query will be saved to a file named result.html.

Performing text search queries in deferred mode

Getting started

Get your cloud ready

Send a search query

See also

Was the article helpful?

Performing text search queries in deferred mode

Getting startedGetting started

Get your cloud readyGet your cloud ready

Send a search querySend a search query

See alsoSee also

Was the article helpful?

Getting started

Get your cloud ready

Send a search query

See also