Search Index, gRPC: SearchIndexService.Create
- gRPC request
- CreateSearchIndexRequest
- ExpirationConfig
- TextSearchIndex
- ChunkingStrategy
- StaticChunkingStrategy
- VectorSearchIndex
- HybridSearchIndex
- CombinationStrategy
- MeanCombinationStrategy
- ReciprocalRankFusionCombinationStrategy
- operation.Operation
- SearchIndex
- ExpirationConfig
- TextSearchIndex
- ChunkingStrategy
- StaticChunkingStrategy
- VectorSearchIndex
- HybridSearchIndex
- CombinationStrategy
- MeanCombinationStrategy
- ReciprocalRankFusionCombinationStrategy
Create a new search index in asynchronous mode.
gRPC request
rpc Create (CreateSearchIndexRequest) returns (operation.Operation)
CreateSearchIndexRequest
{
"folder_id": "string",
"file_ids": [
"string"
],
"name": "string",
"description": "string",
"expiration_config": {
"expiration_policy": "ExpirationPolicy",
"ttl_days": "int64"
},
"labels": "map<string, string>",
// Includes only one of the fields `text_search_index`, `vector_search_index`, `hybrid_search_index`
"text_search_index": {
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"vector_search_index": {
"doc_embedder_uri": "string",
"query_embedder_uri": "string",
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"hybrid_search_index": {
"text_search_index": {
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"vector_search_index": {
"doc_embedder_uri": "string",
"query_embedder_uri": "string",
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
},
"normalization_strategy": "NormalizationStrategy",
"combination_strategy": {
// Includes only one of the fields `mean_combination`, `rrf_combination`
"mean_combination": {
"mean_evaluation_technique": "MeanEvaluationTechnique",
"weights": [
"double"
]
},
"rrf_combination": {
"k": "google.protobuf.Int64Value"
}
// end of the list of possible fields
}
}
// end of the list of possible fields
}
Request to create a new search index.
Field |
Description |
folder_id |
string Required field. |
file_ids[] |
string List of file IDs to be indexed. |
name |
string Name of the search index. |
description |
string Description of the search index. |
expiration_config |
Expiration configuration for the search index. |
labels |
object (map<string, string>) Set of key-value pairs to label the search index. |
text_search_index |
Configuration for a traditional keyword-based text search index. Includes only one of the fields |
vector_search_index |
Configuration for a vector-based search index using embeddings. Includes only one of the fields |
hybrid_search_index |
Configuration for a hybrid (vector-based + keyword-based) search index. Includes only one of the fields |
ExpirationConfig
Field |
Description |
expiration_policy |
enum ExpirationPolicy
|
ttl_days |
int64 |
TextSearchIndex
Defines the configuration for a traditional keyword-based text search index.
Field |
Description |
chunking_strategy |
Chunking strategy used to split text into smaller chunks before indexing. |
ChunkingStrategy
Defines a general strategy for chunking text into smaller segments.
Currently, only StaticChunkingStrategy is supported.
Field |
Description |
static_strategy |
Includes only one of the fields |
StaticChunkingStrategy
Defines a chunking strategy where chunks are created with a fixed maximum chunk size and an overlap between consecutive chunks.
Field |
Description |
max_chunk_size_tokens |
int64 The maximum number of tokens allowed in a single chunk. |
chunk_overlap_tokens |
int64 The number of tokens that should overlap between consecutive chunks. |
VectorSearchIndex
Defines the configuration for a vector-based search index. This type uses embeddings to represent documents and queries.
Field |
Description |
doc_embedder_uri |
string The ID of the model to be used for obtaining document text embeddings. |
query_embedder_uri |
string The ID of the model to be used for obtaining query text embeddings. |
chunking_strategy |
Chunking strategy used to split text into smaller chunks before indexing. |
HybridSearchIndex
Defines the configuration for a hybrid (vector-based + keyword-based) search index. This type uses both embeddings and keyword-based search to represent documents and queries.
Field |
Description |
text_search_index |
Configuration for a traditional keyword-based text search index. |
vector_search_index |
Configuration for a vector-based search index. |
chunking_strategy |
Common chunking strategy that applies to both text and vector search indexes. |
normalization_strategy |
enum NormalizationStrategy Normalization strategy for relevance scores from different indices. Default is MIN_MAX_STRATEGY
|
combination_strategy |
Combination strategy for merging rankings from different indices. Default is arithmetic mean |
CombinationStrategy
Combination strategy for merging rankings from different indices
Field |
Description |
mean_combination |
Includes only one of the fields |
rrf_combination |
ReciprocalRankFusionCombinationStrategy Includes only one of the fields |
MeanCombinationStrategy
Field |
Description |
mean_evaluation_technique |
enum MeanEvaluationTechnique Technique for averaging relevance scores from different indices. Default is ARITHMETIC
|
weights[] |
double Weights used for evaluating the weighted mean of relevance scores. The sum of the values must equal 1.0 |
ReciprocalRankFusionCombinationStrategy
https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
Field |
Description |
k |
The parameter k for RRFscore. Default is 60 |
operation.Operation
{
"id": "string",
"description": "string",
"created_at": "google.protobuf.Timestamp",
"created_by": "string",
"modified_at": "google.protobuf.Timestamp",
"done": "bool",
"metadata": "google.protobuf.Any",
// Includes only one of the fields `error`, `response`
"error": "google.rpc.Status",
"response": {
"id": "string",
"folder_id": "string",
"name": "string",
"description": "string",
"created_by": "string",
"created_at": "google.protobuf.Timestamp",
"updated_by": "string",
"updated_at": "google.protobuf.Timestamp",
"expiration_config": {
"expiration_policy": "ExpirationPolicy",
"ttl_days": "int64"
},
"expires_at": "google.protobuf.Timestamp",
"labels": "map<string, string>",
// Includes only one of the fields `text_search_index`, `vector_search_index`, `hybrid_search_index`
"text_search_index": {
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"vector_search_index": {
"doc_embedder_uri": "string",
"query_embedder_uri": "string",
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"hybrid_search_index": {
"text_search_index": {
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"vector_search_index": {
"doc_embedder_uri": "string",
"query_embedder_uri": "string",
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
}
},
"chunking_strategy": {
// Includes only one of the fields `static_strategy`
"static_strategy": {
"max_chunk_size_tokens": "int64",
"chunk_overlap_tokens": "int64"
}
// end of the list of possible fields
},
"normalization_strategy": "NormalizationStrategy",
"combination_strategy": {
// Includes only one of the fields `mean_combination`, `rrf_combination`
"mean_combination": {
"mean_evaluation_technique": "MeanEvaluationTechnique",
"weights": [
"double"
]
},
"rrf_combination": {
"k": "google.protobuf.Int64Value"
}
// end of the list of possible fields
}
}
// end of the list of possible fields
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
created_at |
Creation timestamp. |
created_by |
string ID of the user or service account who initiated the operation. |
modified_at |
The time when the Operation resource was last modified. |
done |
bool If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |
SearchIndex
Represents a search index used to store and query data, either using traditional keyword-based text search or vector-based search mechanisms.
Field |
Description |
id |
string Unique identifier of the search index. |
folder_id |
string ID of the folder that the search index belongs to. |
name |
string Name of the search index. |
description |
string Description of the search index. |
created_by |
string Identifier of the subject who created this search index. |
created_at |
Timestamp representing when the search index was created. |
updated_by |
string Identifier of the subject who last updated this search index. |
updated_at |
Timestamp representing the last time this search index was updated. |
expiration_config |
Configuration for the expiration of the search index, defining when and how the search index will expire. |
expires_at |
Timestamp representing when the search index will expire. |
labels |
object (map<string, string>) Set of key-value pairs that can be used to organize and categorize the search index. |
text_search_index |
Keyword-based text search index configuration. Includes only one of the fields Type of the search index. It can be either a traditional keyword-based text search or a vector-based search. |
vector_search_index |
Vector-based search index configuration. Includes only one of the fields Type of the search index. It can be either a traditional keyword-based text search or a vector-based search. |
hybrid_search_index |
Hybrid (vector-based + keyword-based) search index configuration Includes only one of the fields Type of the search index. It can be either a traditional keyword-based text search or a vector-based search. |
ExpirationConfig
Field |
Description |
expiration_policy |
enum ExpirationPolicy
|
ttl_days |
int64 |
TextSearchIndex
Defines the configuration for a traditional keyword-based text search index.
Field |
Description |
chunking_strategy |
Chunking strategy used to split text into smaller chunks before indexing. |
ChunkingStrategy
Defines a general strategy for chunking text into smaller segments.
Currently, only StaticChunkingStrategy is supported.
Field |
Description |
static_strategy |
Includes only one of the fields |
StaticChunkingStrategy
Defines a chunking strategy where chunks are created with a fixed maximum chunk size and an overlap between consecutive chunks.
Field |
Description |
max_chunk_size_tokens |
int64 The maximum number of tokens allowed in a single chunk. |
chunk_overlap_tokens |
int64 The number of tokens that should overlap between consecutive chunks. |
VectorSearchIndex
Defines the configuration for a vector-based search index. This type uses embeddings to represent documents and queries.
Field |
Description |
doc_embedder_uri |
string The ID of the model to be used for obtaining document text embeddings. |
query_embedder_uri |
string The ID of the model to be used for obtaining query text embeddings. |
chunking_strategy |
Chunking strategy used to split text into smaller chunks before indexing. |
HybridSearchIndex
Defines the configuration for a hybrid (vector-based + keyword-based) search index. This type uses both embeddings and keyword-based search to represent documents and queries.
Field |
Description |
text_search_index |
Configuration for a traditional keyword-based text search index. |
vector_search_index |
Configuration for a vector-based search index. |
chunking_strategy |
Common chunking strategy that applies to both text and vector search indexes. |
normalization_strategy |
enum NormalizationStrategy Normalization strategy for relevance scores from different indices. Default is MIN_MAX_STRATEGY
|
combination_strategy |
Combination strategy for merging rankings from different indices. Default is arithmetic mean |
CombinationStrategy
Combination strategy for merging rankings from different indices
Field |
Description |
mean_combination |
Includes only one of the fields |
rrf_combination |
ReciprocalRankFusionCombinationStrategy Includes only one of the fields |
MeanCombinationStrategy
Field |
Description |
mean_evaluation_technique |
enum MeanEvaluationTechnique Technique for averaging relevance scores from different indices. Default is ARITHMETIC
|
weights[] |
double Weights used for evaluating the weighted mean of relevance scores. The sum of the values must equal 1.0 |
ReciprocalRankFusionCombinationStrategy
https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
Field |
Description |
k |
The parameter k for RRFscore. Default is 60 |