Searching for images by image
You can use Yandex Search API to search through the Yandex Images
Getting started
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can navigate to the cloud page
Learn more about clouds and folders here.
Get your cloud ready
To use the examples:
- Create a service account and assign the
search-api.webSearch.userrole to it. -
Get and save the service account's API key with
yc.search-api.executefor its scope.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
Note
If you are using Windows
, we recommend installing the WSL shell first and using it to proceed. -
Install Python 3.10
or higher. -
Install Python venv
to create isolated virtual environments in Python. -
Create a new Python virtual environment and activate it:
python3 -m venv new-env source new-env/bin/activate -
Use the pip
package manager to install the ML SDK library:pip install yandex-cloud-ml-sdk
-
Create a service account you will use to send requests. You can also use a Yandex account or a federated account, but a service account is a better choice for automation purposes.
-
Assign the
search-api.webSearch.userrole to the account you will use to send requests. -
Get an IAM token, which is required for authentication.
The following examples use IAM token authentication. To use a service account's API key for authentication, edit the
Authorizationheader in the query examples. For more information, see API authentication.
To use the examples, you should additionally install cURL
-
Create a service account you will use to send requests. You can also use a Yandex account or a federated account, but a service account is a better choice for automation purposes.
-
Assign the
search-api.webSearch.userrole to the account you will use to send requests. -
Get an IAM token, which is required for authentication.
The following examples use IAM token authentication. To use a service account's API key for authentication, edit the
Authorizationheader in the query examples. For more information, see API authentication.
To use the examples, you should additionally install gRPCurl
Send a search query
To run a search query:
-
Create a file named
pic-search-by-pic.pyand paste the following code into it:#!/usr/bin/env python3 from __future__ import annotations import pathlib from yandex_cloud_ml_sdk import YCloudML from yandex_cloud_ml_sdk.search_api import FamilyMode EXAMPLE_FILE = pathlib.Path(__file__).parent / "image.jpg" def main() -> None: sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) sdk.setup_default_logging() # You can pass initial configuration here: search = sdk.search_api.by_image( family_mode="moderate", site="ya.ru", ) # Or configure the Search object later: search = search.configure( # family mode may be passed as a string or as a special enum value family_mode=FamilyMode.NONE, ) # You can reset any config property back to its default value by passing None: search = search.configure(site=None) search_type = input( "Select a search type:\n1: Using a remote image URL (default)\n2: Using bytes data from './image.jpeg'\n\n" ) if not search_type.strip(): search_type = "1" if int(search_type) == 2: # The first search option is to search using bytes data: image_data = pathlib.Path(EXAMPLE_FILE).read_bytes() search_result = search.run(image_data) else: # The second search option is to search using a remote image url: # e.g. Photo of Leo Tolstoy url = "https://upload.wikimedia.org/wikipedia/commons/b/be/Leo_Tolstoy_1908_Portrait_%283x4_cropped%29.jpg" search_result = search.run_from_url(url) # You can examine the search_result structure via pprint # to get to know how to work with it: # pprint.pprint(search_result) # Search results can also be used in boolean context: if search_result: print(f"{len(search_result)} documents found") else: print("Nothing found") # The third search option is to search using the image's CBIR ID: # using CBIR ID is way faster than any other option, # but it requires to make at least one "heavy" request to get this ID. cbid_id = search_result.cbir_id search_result = search.run_from_id(cbid_id, page=1) while search_result: print(f"Page {search_result.page}:") output_filename = ( str(pathlib.Path(__file__).parent) + "/" + "results_page_" + str(search_result.page) + ".txt" ) file = open(output_filename, "a") for document in search_result: file.write(str(document) + "\n\n") print(f"Page {search_result.page} saved to file {output_filename}") file.close() # search_result.next_page() is a shortcut for # `.run_from_id(search_query.cbir_id, page=page + 1)` # with search configuration saved from the initial run; # last page + 1 will return an "empty" search_result; search_result = search_result.next_page() if __name__ == "__main__": main()Where:
-
<folder_ID>: ID of the folder in which the service account was created. -
<API_key>: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
You can set the search parameters in the relevant
search_api.by_imageclass object properties or the.configuremethod properties:-
family_mode: Results filtering. This is an optional parameter. The possible values are:moderate: Moderate filter (default). Adult category documents are excluded from search results unless the query explicitly targets resources of this category.none: Filtering is off. Search results include any documents regardless of their contents.strict: Family filter. Regardless of the search query, Adult category documents and documents containing profanity are excluded from search results.
-
-
Run the file you created:
python3 pic-search-by-pic.pyDuring execution, the code will ask you to select a search option:
- Based on an image published on the internet. The URL of this image is specified in the
urlvariable. - Based on an image from the local computer. Local path to the image is specified in the
EXAMPLE_FILEvariable.
At the final stage, the code will search for images by the CBIR
ID and then save the search results page by page in the current directory into text files:Page 1 saved to file /Users/MyUser/Desktop/results_page_1.txt ... Page 134 saved to file /Users/MyUser/Desktop/results_page_134.txt - Based on an image published on the internet. The URL of this image is specified in the
-
Create a file with the request body, e.g.,
body.json:body.json
{ "site": "<website_domain_name>", "folderId": "<folder_ID>", "url": "<source_image_URL>", "data": "<image_data>", "id": "<CBIR_ID>", "page": "<page_number>" }Description of fields
-
site: Searching for images only on specified website, e.g.,yandex.cloud. This is an optional setting. If not set, the search includes all websites in the search base. -
folderId: Folder ID of the user or service account you will use for queries. -
url: Source image URL. -
data: Source image data, Base64 -encoded. -
id: Source image CBIR ID. Specify the ID you got in the response to get the next search result page faster.Note
You can provide only one of these parameters in your query:
url,id, ordata. -
page: Requested page number. This is an optional setting. By default, the first page with search results is returned. Page numbering starts from zero (0stands for page one).
Request body example
body.json
{ "folderId": "b1gt6g8ht345********", "data": "<base64-encoded_image>", "page": "1" } -
-
Send an HTTP request specifying the IAM token you got earlier and a path to the request body file:
curl \ --request POST \ --header "Authorization: Bearer <IAM_token>" \ --data "@body.json" \ "https://searchapi.api.cloud.yandex.net/v2/image/search_by_image" \ > result.jsonThe search results in JSON format will be saved to a file named
result.json.
-
Create a file with the request body, e.g.,
body.json:body.json
{ "site": "<website_domain_name>", "folder_id": "<folder_ID>", "url": "<source_image_URL>", "data": "<image_data>", "id": "<CBIR_ID>", "page": "<page_number>" }Description of fields
-
site: Searching for images only on specified website, e.g.,yandex.cloud. This is an optional setting. If not set, the search includes all websites in the search base. -
folder_id: Folder ID of the user or service account you will use for queries. -
url: Source image URL. -
data: Source image data, Base64 -encoded. -
id: Source image CBIR ID. Specify the ID you got in the response to get the next search result page faster.Note
You can provide only one of these parameters in your query:
url,id, ordata. -
page: Requested page number. This is an optional setting. By default, the first page with search results is returned. Page numbering starts from zero (0stands for page one).
Request body example
body.json
{ "folder_id": "b1gt6g8ht345********", "data": "<base64-encoded_image>", "page": "1" } -
-
Run a gRPC call specifying the IAM token you got earlier and a path to the request body file:
grpcurl \ -rpc-header "Authorization: Bearer <IAM_token>" \ -d @ < body.json \ searchapi.api.cloud.yandex.net:443 yandex.cloud.searchapi.v2.ImageSearchService/SearchByImage \ > result.jsonThe search results in JSON format will be saved to a file named
result.json.