Performing search queries using API v2
With Yandex Search API's API v2, you can perform text search through the Yandex search database and get search results in XML or HTML format in deferred (asynchronous) mode. You can run queries using REST API and gPRC API. The search results you get depend on the parameters specified in your query.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
To use the examples, install the cURL
Prepare your cloud
-
For authenticating with the API v2 as a service account, create a service account.
-
Assign the
search-api.webSearch.user
role to the user or service account you will use to run queries. -
Get an IAM token, which is required for authentication.
The following examples use IAM token authentication. To use a service account's API key for authentication, edit the
Authorization
header in the query examples. For more information, see Authentication in API v2.
Create a search query
-
Create a file with the request body, e.g.,
body.json
:body.json
{ "query": { "searchType": "<search_type>", "queryText": "<search_query_text>", "familyMode": "<result_filter_setting_value>", "page": "<page_number>", "fixTypoMode": "<typo_correction_mode_setting_value>" }, "sortSpec": { "sortMode": "<result_sorting_rule>", "sortOrder": "<sort_order_of_results>" }, "groupSpec": { "groupMode": "<result_grouping_method>", "groupsOnPage": "<number_of_groups_per_page>", "docsInGroup": "<number_of_documents_per_group>" }, "maxPassages": "<maximum_number_of_passages>", "region": "<region_ID>", "l10N": "<notification_language>", "folderId": "<folder_ID>", "responseFormat": "<result_format>", "userAgent": "<User-Agent_header>" }
Description of fields
-
searchType
: Search type. The possible values are:SEARCH_TYPE_RU
: For theRussian
search type.SEARCH_TYPE_TR
: For theTurkish
search type.SEARCH_TYPE_COM
: For theInternational
search type.
-
queryText
: Search query text. The maximum length is 400 characters. -
familyMode
: Results filtering. This is an optional parameter. The possible values are:FAMILY_MODE_MODERATE
: Moderate filter (default). Documents of the Adult category are excluded from search results unless a query is explicitly made for searching resources of this category.FAMILY_MODE_NONE
: Filtering is disabled. Search results include any documents regardless of their contents.FAMILY_MODE_STRICT
: Family filter. Regardless of a search query, documents of the Adult category and those with profanity are excluded from search results.
-
page
: Requested page number. This is an optional parameter. By default, the first page with search results is returned. Page numbering starts from zero (0
stands for page 1). -
fixTypoMode
: Search query typo correction setting. This is an optional parameter. The possible values are:FIX_TYPO_MODE_ON
: Typo correction enabled (default). Search query typos are corrected automatically.FIX_TYPO_MODE_OFF
: Typo correction disabled. Search query typos are not corrected. The search is performed strictly as per the query.
-
sortMode
: Search results sorting mode rule. This is an optional parameter. The possible values are:SORT_MODE_BY_RELEVANCE
: Sorting by relevance (default).SORT_MODE_BY_TIME
: Sorting by document update time.
-
sortOrder
: Search results sorting order. This is an optional parameter. The possible values are:SORT_ORDER_DESC
: Forward sorting order from most recent to oldest (default).SORT_ORDER_ASC
: Reverse sorting order from oldest to most recent.
-
groupMode
: Result grouping method. This is an optional parameter. The possible values are:GROUP_MODE_DEEP
: Grouping by domain. Each group contains documents from one domain (default).GROUP_MODE_FLAT
: Flat grouping. Each group contains a single document.
-
groupsOnPage
: Maximum number of groups that can be returned per page. This is an optional parameter. The values range from1
to100
. The default value is20
. -
docsInGroup
: Maximum number of documents that can be returned per group. This is an optional parameter. The values range from1
to3
. The default value is1
. -
maxPassages
: Maximum number of passages that can be used when generating a document snippet. This is an optional parameter. The values range from1
to5
. By default, a maximum of four passages with search query text is returned per document. -
region
: Search country or region ID that affects the document ranking rules. Only supported for theRussian
andTurkish
search types.For a list of frequently used country and region IDs, see Search regions.
-
l10N
: Search response notifications language. Affects the text in thefound-docs-human
tag and error messages. This is an optional parameter. Possible values depend on the selected search type:-
Russian
:LOCALIZATION_RU
: Russian (default).LOCALIZATION_BE
: Belarusian.LOCALIZATION_KK
: Kazakh.LOCALIZATION_UK
: Ukrainian.
-
Turkish
:LOCALIZATION_TR
: Turkish.
-
International
:LOCALIZATION_EN
: English.
-
-
folderId
: Folder ID of the user or service account you will use for queries. -
responseFormat
: Search results format. This is an optional parameter. The possible values are:FORMAT_XML
: The results will be delivered in XML format (default).FORMAT_HTML
: The results will be delivered in HTML format.
-
userAgent
: String containing the User-Agent header . Use this parameter to have your search results optimized for a specific device and browser, including mobile search results. This is an optional parameter. If not specified, you will get the default output.
-
-
Run an http query by specifying the IAM token you got earlier:
curl \ --request POST \ --header "Authorization: Bearer <IAM_token>" \ --data "@body.json" \ "https://searchapi.api.cloud.yandex.net/v2/web/searchAsync"
Result:
{ "done": false, "id": "sppger465oq1********", "description": "WEB search async", "createdAt": "2024-10-02T19:51:02Z", "createdBy": "bfbud0oddqp4********", "modifiedAt": "2024-10-02T19:51:03Z" }
-
Create a file with the request body, e.g.,
body.json
:body.json
{ "query": { "search_type": "<search_type>", "query_text": "<search_query_text>", "family_mode": "<result_filter_setting_value>", "page": "<page_number>", "fix_typo_mode": "<typo_correction_mode_setting_value>" }, "sort_spec": { "sort_mode": "<result_sorting_rule>", "sort_order": "<sort_order_of_results>" }, "group_spec": { "group_mode": "<result_grouping_method>", "groups_on_page": "<number_of_groups_per_page>", "docs_in_group": "<number_of_documents_per_group>" }, "max_passages": "<maximum_number_of_passages>", "region": "<region_ID>", "l10n": "<notification_language>", "folder_id": "<folder_ID>", "response_format": "<result_format>", "user_agent": "<User-Agent_header>" }
Description of fields
-
search_type
: Search type. The possible values are:SEARCH_TYPE_RU
: For theRussian
search type.SEARCH_TYPE_TR
: For theTurkish
search type.SEARCH_TYPE_COM
: For theInternational
search type.
-
query_text
: Search query text. The maximum length is 400 characters. -
family_mode
: Results filtering. This is an optional parameter. The possible values are:FAMILY_MODE_MODERATE
: Moderate filter (default). Documents of the Adult category are excluded from search results unless a query is explicitly made for searching resources of this category.FAMILY_MODE_NONE
: Filtering is disabled. Search results include any documents regardless of their contents.FAMILY_MODE_STRICT
: Family filter. Regardless of a search query, documents of the Adult category and those with profanity are excluded from search results.
-
page
: Requested page number. This is an optional parameter. By default, the first page with search results is returned. Page numbering starts from zero (0
stands for page 1). -
fix_typo_mode
: Search query typo correction setting. This is an optional parameter. The possible values are:FIX_TYPO_MODE_ON
: Typo correction enabled (default). Search query typos are corrected automatically.FIX_TYPO_MODE_OFF
: Typo correction disabled. Search query typos are not corrected. The search is performed strictly as per the query.
-
sort_mode
: Search results sorting mode rule. This is an optional parameter. The possible values are:SORT_MODE_BY_RELEVANCE
: Sorting by relevance (default).SORT_MODE_BY_TIME
: Sorting by document update time.
-
sort_order
: Search results sorting order. This is an optional parameter. The possible values are:SORT_ORDER_DESC
: Forward sorting order from most recent to oldest (default).SORT_ORDER_ASC
: Reverse sorting order from oldest to most recent.
-
group_mode
: Result grouping method. This is an optional parameter. The possible values are:GROUP_MODE_DEEP
: Grouping by domain. Each group contains documents from one domain (default).GROUP_MODE_FLAT
: Flat grouping. Each group contains a single document.
-
groups_on_page
: Maximum number of groups that can be returned per page. This is an optional parameter. The values range from1
to100
. The default value is20
. -
docs_in_group
: Maximum number of documents that can be returned per group. This is an optional parameter. The values range from1
to3
. The default value is1
. -
max_passages
: Maximum number of passages that can be used when generating a document snippet. This is an optional parameter. The values range from1
to5
. By default, a maximum of four passages with search query text is returned per document. -
region
: Search country or region ID that affects the document ranking rules. Only supported for theRussian
andTurkish
search types.For a list of frequently used country and region IDs, see Search regions.
-
l10n
: Search response notifications language. Affects the text in thefound-docs-human
tag and error messages. This is an optional parameter. Possible values depend on the selected search type:-
Russian
:LOCALIZATION_RU
: Russian (default).LOCALIZATION_BE
: Belarusian.LOCALIZATION_KK
: Kazakh.LOCALIZATION_UK
: Ukrainian.
-
Turkish
:LOCALIZATION_TR
: Turkish.
-
International
:LOCALIZATION_EN
: English.
-
-
folder_id
: Folder ID of the user or service account you will use for queries. -
response_format
: Search results format. This is an optional parameter. The possible values are:FORMAT_XML
: The results will be delivered in XML format (default).FORMAT_HTML
: The results will be delivered in HTML format.
-
user_agent
: String containing the User-Agent header . Use this parameter to have your search results optimized for a specific device and browser, including mobile search results. This is an optional parameter. If not specified, you will get the default output.
-
-
Run a gRPC call by specifying the IAM token you got earlier:
grpcurl \ -rpc-header "Authorization: Bearer <IAM_token>" \ -d @ < body.json \ searchapi.api.cloud.yandex.net:443 yandex.cloud.searchapi.v2.WebSearchAsyncService/Search
Result:
{ "id": "spp3gp3vhna6********", "description": "WEB search async", "createdAt": "2024-10-02T19:14:41Z", "createdBy": "bfbud0oddqp4********", "modifiedAt": "2024-10-02T19:14:42Z" }
Save the obtained Operation object ID (id
value) for later use.
Make sure the query was executed successfully
Wait until Yandex Search API executes the query and generates a response. This may take from five minutes to a few hours.
Make sure the query was executed successfully:
Run an http query:
curl \
--request GET \
--header "Authorization: Bearer <IAM_token>" \
https://operation.api.cloud.yandex.net/operations/<query_ID>
Where:
<IAM_token>
: Previously obtained IAM token.<query_ID>
: The Operation object ID you saved at the previous step.
Result:
{
"done": true,
"response": {
"@type": "type.googleapis.com/yandex.cloud.searchapi.v2.WebSearchResponse",
"rawData": "<Base64_encoded_XML_response_body>"
},
"id": "spp82pc07ebl********",
"description": "WEB search async",
"createdAt": "2024-10-03T08:07:07Z",
"createdBy": "bfbud0oddqp4********",
"modifiedAt": "2024-10-03T08:12:09Z"
}
Run this gRPC call:
grpcurl \
-rpc-header "Authorization: Bearer <IAM_token>" \
-d '{"operation_id": "<query_ID>"}' \
operation.api.cloud.yandex.net:443 yandex.cloud.operation.OperationService/Get
Where:
<IAM_token>
: Previously obtained IAM token.<query_ID>
: The Operation object ID you saved at the previous step.
Result:
{
"id": "spp82pc07ebl********",
"description": "WEB search async",
"createdAt": "2024-10-03T08:07:07Z",
"createdBy": "bfbud0oddqp4********",
"modifiedAt": "2024-10-03T08:12:09Z",
"done": true,
"response": {
"@type": "type.googleapis.com/yandex.cloud.searchapi.v2.WebSearchResponse",
"rawData": "<Base64_encoded_XML_response_body>"
}
}
If the done
field is set to true
and the response
object is present in the output, the query has been completed successfully, so you can move on to the next step. Otherwise, repeat the check later.
Get a response
After Yandex Search API has successfully processed the query:
-
Get the result:
REST APIgRPC APIcurl \ --request GET \ --header "Authorization: Bearer <IAM_token>" \ https://operation.api.cloud.yandex.net/operations/<query_ID> \ > result.json
grpcurl \ -rpc-header "Authorization: Bearer <IAM_token>" \ -d '{"operation_id": "<query_ID>"}' \ operation.api.cloud.yandex.net:443 yandex.cloud.operation.OperationService/Get \ > result.json
Eventually the search query result will be saved to a file named
result.json
containing a Base64-encoded XML or HTML response in theresponse.rawData
field. -
Depending on the requested response format, decode the result from
Base64
:XMLHTMLecho "$(< result.json)" | \ jq -r .response.rawData | \ base64 --decode > result.xml
The XML response to the query will be saved to a file named
result.xml
.echo "$(< result.json)" | \ jq -r .response.rawData | \ base64 --decode > result.html
The HTML response to the query will be saved to a file named
result.html
.