Datasets domain
class yandex_cloud_ml_sdk._datasets.domain.AsyncDatasets
This class provides methods to create and manage datasets of a specific type.
async get(dataset_id, *, timeout=60)
Fetch a dataset from the server using its ID.
|
Parameters |
|
|
Return type |
async list(*, status=Undefined, name_pattern=Undefined, task_type=Undefined, timeout=60)
Fetch a list of datasets based on specified filters.
|
Parameters |
|
|
Return type |
async list_upload_formats(task_type, *, timeout=60)
Fetch available upload formats for a specified task type.
|
Parameters |
|
|
Return type |
async list_upload_schemas(task_type, *, timeout=60)
Fetch available upload schemas for a specified task type.
|
Parameters |
|
|
Return type |
completions
a helper for autocompletion text-to-text generation tasks
draft_from_path(path, *, task_type=Undefined, upload_format=Undefined, name=Undefined, description=Undefined, metadata=Undefined, labels=Undefined, allow_data_logging=Undefined)
Create a new dataset draft from a specified path.
|
Parameters |
|
|
Return type |
text_classifiers_binary
a helper for autocompletion binary text classification tasks
text_classifiers_multiclass
a helper for autocompletion multiclass text classification tasks
text_classifiers_multilabel
a helper for autocompletion multilabel text classification tasks
text_embeddings_pair
a helper for autocompletion pairwise text embeddings tasks
text_embeddings_triplet
a helper for autocompletion triplet text embeddings tasks
class yandex_cloud_ml_sdk._datasets.dataset.AsyncDataset
async update(*, name=Undefined, description=Undefined, labels=Undefined, timeout=60)
Updates the dataset with the provided parameters.
|
Parameters |
|
|
Return type |
async delete(*, timeout=60)
Deletes the dataset.
|
Parameters |
timeout (float |
|
Return type |
None |
async list_upload_formats(*, timeout=60)
Retrieve a list of upload formats for the dataset.
|
Parameters |
timeout (float |
|
Return type |
async download(*, download_path, timeout=60, exist_ok=False, max_parallel_downloads=16)
Download a dataset to the specified path.
|
Parameters |
|
|
Return type |
read(*, timeout=60, batch_size=Undefined)
Reads the dataset from backend and yields it records one by one.
This method lazily loads records by chunks, minimizing memory usage for large datasets. The iterator yields dictionaries where keys are field names and values are parsed data.
Note
This method creates temporary files in the system’s default temporary directory during operation. To control the location of temporary files, refer to Python’s tempfile.gettempdir()
|
Parameters |
|
|
Yields |
Dictionary representing single record with field-value pairs |
|
Return type |
AsyncIterator |
folder_id: str
the ID of the folder which contains the dataset
the name of the dataset
a description of the dataset
metadata associated with the dataset
created_by: str
the user who created the dataset
created_at: datetime
the timestamp when the dataset was created
updated_at: datetime
the timestamp when the dataset was last updated
a dictionary of labels associated with the dataset
allow_data_logging: bool
indicates if data logging is allowed for this dataset
status: DatasetStatus
the current status of the dataset
task_type: str
the type of task associated with the dataset
rows: int
the number of rows in the dataset
size_bytes: int
the size of the dataset in bytes
validation_errors: tuple
a tuple of validation errors associated with the dataset
id: str
class yandex_cloud_ml_sdk._datasets.draft.AsyncDatasetDraft
This class allows users to create a draft representation of a dataset without immediately interacting with the server. This draft serves as a structure for storing configuration settings, enabling users to edit the dataset’s properties before finalizing the upload.
async upload_deferred(*, timeout=60, upload_timeout=360, raise_on_validation_failure=True, chunk_size=104857600, parallelism=None)
Creates a dataset object on the server, uploads data to S3, triggers validation of the created dataset, and waits for its completion.
|
Parameters |
|
|
Return type |
async upload(*, timeout=60, upload_timeout=360, raise_on_validation_failure=True, poll_timeout=21600, poll_interval=60, chunk_size=104857600, parallelism=None)
This method also performs the upload operation, but unlike _upload_deferred function, which returns an operation object, it directly returns the result of the completed operation.
|
Parameters |
|
allow_data_logging: bool
a flag indicating if iyt is allowed to use the dataset to improve the models quality. Default false.
configure(**kwargs)
|
Parameters |
kwargs (Any |
|
Return type |
description: str
a description of the dataset
labels: dict
labels for categorizing the dataset
metadata associated with the dataset
the name of the dataset
the file path to the dataset
the type of task associated with the dataset
upload_format: str
the format in which the dataset will be uploaded
validate()
|
Return type |
None |