Reading data from Object Storage using Query connections
When working with Yandex Object Storage, it is convenient to use connections for prototyping and initial setup of connections to data.
Sample query for reading data:
SELECT
*
FROM
object_storage.`*.tsv`
WITH
(
format=tsv_with_names,
SCHEMA
(
`timestamp` Uint32,
action String
)
);
Setting up a connection
To create a connection to Object Storage:
-
In the management console
, select the folder where you want to create a connection. -
In the list of services, select Yandex Query.
-
In the left-hand panel, go to the Connections tab.
-
Click
Create new. -
Specify the connection parameters:
-
Under General parameters:
- Name: Name of the connection to Object Storage.
- Type:
Object Storage
.
-
Under Connection type parameters:
-
Bucket auth: Select
Public
orPrivate
depending on the type of the bucket object read permissions.For a public bucket, enter a name in the Bucket field.
For a private bucket, select:-
Cloud and Folder where the data source is located.
-
Select a bucket or create a new one.
-
Select or create a service account with the
storage.viewer
role to be used to access the data.To use a service account, the
iam.serviceAccounts.user
role is required.
-
-
-
-
Click Create.
Data model
Object Storage stores data as binary files. To read data, use the following SQL statement:
SELECT
<expression>
FROM
<connection>.<path>
WITH(
FORMAT = "<data_format>",
COMPRESSION = "<compression_format>",
SCHEMA = (<schema_description>))
WHERE
<filter>;
Where:
<connection>
: Name of the storage connection.<path>
: Path to a file or files in the bucket. Wildcard characters (*
) are supported.<data_format>
: Data format in the files.<compression_format>
: File compression format.<schema_description>
: Data schema description in the files.
Data schema description
Data schema description includes the following fields:
- Field name
- Field type
- Attribute indicating a required field
For example, the below data schema describes a required schema field named Year
of the Int32
type:
Year Int32 NOT NULL
If a field is marked as required (NOT NULL
) but it is missing from the file being processed, this operation will fail with an error. If a field is marked as optional (NULL
), no error will occur if that field is missing from the file being processed but the field will take the NULL
value. The NULL
keyword in optional fields is optional.
Automatic output of a data schema
Automatic output of a schema is available for all data formats except raw
and json_as_string
. This is convenient when a schema contains a large number of fields. To avoid entering these fields manually, use the WITH_INFER
parameter:
SELECT
<expression>
FROM
<connection>.<path>
WITH(
FORMAT = "<data_format>",
COMPRESSION = "<compression_format>",
WITH_INFER="true")
WHERE
<filter>;
Where:
<connection>
: Name of the storage connection.<path>
: Path to a file or files in the bucket. Wildcard characters (*
) are supported.<data_format>
: Data format in the files.<compression_format>
: File compression format.
This request will automatically output field names and types.
Data path formats
Yandex Query supports the following paths to data:
Path format | Description | Example |
---|---|---|
Path that ends with / |
Folder path | Path /a locates the entire contents of a folder:/a/b/c/d/1.txt /a/b/2.csv |
Path that contains the * macro substitution character |
Any files nested in the path | Path /a/*.csv locates files in folders:/a/b/c/1.csv /a/2.csv /a/b/c/d/e/f/g/2.csv |
Path that neither ends with / nor contains macro substitution characters |
Path to an individual file | Path /a/b.csv locates a specific file: /a/b.csv |
Example of reading data using connections
Sample query for reading data from Object Storage:
SELECT
*
FROM
connection.`folder/filename.csv`
WITH(
format='csv_with_names',
SCHEMA
(
Year int,
Manufacturer String,
Model String,
Price Double
)
);
Where:
connection
: Name of the connection to Object Storage.folder/filename.csv
: Path to the file in the Object Storage bucket.SCHEMA
: Data schema description in the file.