Classifiers based on YandexGPT
The classifiers based on YandexGPT feature is at the Preview stage.
Yandex Foundation Models allows classifying the text requests provided in prompts. Classification in YandexGPT-based models is implemented in the Foundation Models Text Classification API.
There are three types of classification available in Foundation Models:
- Binary classification puts a request into one of two possible classes. For example, spam
or non-spam. - Multi-class classification puts a request into one (and only one) of more than two classes. For example, a computer CPU can belong to one generation only.
- Multi-label classification allows putting a request into a number of different non-mutually exclusive classes at the same time. For example, several hashtags
can belong to the same post on social media at the same time.
Classification models are only available in synchronous mode.
Foundation Models provides YandexGPT classifiers of these two types: prompt-based and trainable.
Prompt-based classifiers
Foundation Models prompt-based classifiers support binary and multi-class classification, require no model tuning, and are prompt-controlled. The fewShotClassify Text Classification API method allows using these two prompt-based classifiers: Zero-shot and Few-shot. You can submit from 2 to 20 classes to the fewShotClassify
method.
Tip
Give meaningful names to label
classes: this is essential for correct classification results. For example, use chemistry
and physics
rather than chm
and phs
for class names.
Zero-shot classifier
The Zero-shot classifier allows to perform binary and multi-class classification by providing only the model ID, task description, request text, and an array of class names in the request body.
Request body format for the Zero-shot classifier:
{
"modelUri": "string",
"taskDescription": "string",
"labels": [
"string",
"string",
...
"string"
],
"text": "string"
}
Where:
-
modelUri
: ID of the model that will be used to classify the message. The parameter contains Yandex Cloud folder ID. -
taskDescription
: Text description of the task for the classifier. -
labels
: Array of classes.Give meaningful names to
label
classes: this is essential for correct classification results. For example, usechemistry
andphysics
rather thanchm
andphs
for class names. -
text
: Message text.
Few-shot classifier
The Few-shot classifier allows to perform binary and multi-class classification by delivering to the model an array of sample requests for the classes specified in the labels
field. Sample requests are delivered to the samples
field of the request body allowing to improve the classifier output quality.
Request body format for the Few-shot classifier:
{
"modelUri": "string",
"taskDescription": "string",
"labels": [
"string",
"string",
...
"string"
],
"text": "string",
"samples": [
{
"text": "string",
"label": "string"
},
{
"text": "string",
"label": "string"
},
...
{
"text": "string",
"label": "string"
}
]
}
Where:
-
modelUri
: ID of the model that will be used to classify the message. The parameter contains Yandex Cloud folder ID. -
taskDescription
: Text description of the task for the classifier. -
labels
: Array of classes.Give meaningful names to
label
classes: this is essential for correct classification results. For example, usechemistry
andphysics
rather thanchm
andphs
for class names. -
text
: Message text. -
samples
: Array of sample requests for the classes specified in thelabels
field. Sample requests are provided as objects, each one containing one text request sample and the class to which such request should belong.
Warning
You can deliver multiple classification examples in a single request. All examples in the request must not exceed 6,000 tokens.
Trainable classifiers
If you are not satisfied with the output quality of the Zero-shot and Few-shot classifiers, tune your own one based on YandexGPT in Yandex DataSphere. Trainable classifiers can be trained to offer all supported classification types.
To run a request to the classifier of a model fine-tuned in DataSphere, use the classify Text Classification API method. If you do so, you only need to provide the model ID and the request text to the model. The names of the classes between which the model will be distributing requests must be specified during model tuning and are not provided in the request.
Request body format for the classifier of a model fine-tuned in DataSphere:
{
"modelUri": "string",
"text": "string"
}
Where:
modelUri
: ID of the model that will be used to classify the message. The parameter contains Yandex Cloud folder ID and the ID of the model tuned in DataSphere.text
: Message text. The total number of tokens per request must not exceed 8,000.
The names of the classes between which the model will be distributing requests must be specified during model tuning and are not provided in the request.
Response format
All Foundation Models classifier types return the result in the following format:
{
"predictions": [
{
"label": "string",
"confidence": "number",
},
{
"label": "string",
"confidence": "number",
},
...
{
"label": "string",
"confidence": "number",
}
],
"modelVersion": "string"
}
Where:
-
label
: Class name. -
confidence
: Probability of the request text belonging to this class.In multi-class classification, the sum of values of probability (
confidence
) fields for all classes is always equal to1
.When classifying with multiple labels, the value of the probability (
confidence
) field for each class is calculated independently (the sum of values is not equal to1
).