Reading data from Object Storage using Query connections
When working with Yandex Object Storage, it is convenient to use connections for prototyping and initial setup of connections to data.
Sample query for reading data:
SELECT
*
FROM
object_storage.`*.tsv`
WITH
(
format=tsv_with_names,
SCHEMA
(
`timestamp` Uint32,
action String
)
);
Setting up a connection
To create a connection to Object Storage:
-
In the management console
, select the folder where you want to create a connection. -
In the list of services, select Yandex Query.
-
In the left-hand panel, go to the Connections tab.
-
Click
Create new. -
Specify the connection parameters:
-
Under General parameters:
- Name: Name of the connection to Object Storage.
- Type:
Object Storage
.
-
Under Connection type parameters:
-
Bucket auth: Select
Public
orPrivate
depending on the type of the bucket object read permissions.For a public bucket, enter a name in the Bucket field.
For a private bucket, select:-
Cloud and Folder where the data source is located.
-
Select a bucket or create a new one.
-
Select or create a service account with the
storage.viewer
role to be used to access the data.To use a service account, the
iam.serviceAccounts.user
role is required.
-
-
-
-
Click Create.
Data model
Object Storage stores data as binary files. To read data, use the following SQL statement:
SELECT
<expression>
FROM
<connection>.<path>
WITH(format=<data_format>, <compression_format> AS compression)
WHERE
<filter>;
Where:
<connection>
: Name of the storage connection.<path>
: Path to a file or files in the bucket.*
wildcards are supported.<data_format>
: File data format.<compression_format>
: File compression format.
If the data is stored in compressed format, make sure to unpack it for handling. After unpacking, parse the data based on the format used to store it within the files.
Data path formats
Yandex Query supports the following paths to data:
Path format | Description | Example |
---|---|---|
Path that ends with / |
Folder path | Path /a locates the entire contents of a folder:/a/b/c/d/1.txt /a/b/2.csv |
Path that contains the * macro substitution character |
Any files nested in the path | Path /a/*.csv locates files in folders:/a/b/c/1.csv /a/2.csv /a/b/c/d/e/f/g/2.csv |
Path that neither ends with / nor contains macro substitution characters |
Path to an individual file | Path /a/b.csv locates a specific file: /a/b.csv |
Example of reading data using connections
Sample query for reading data from Object Storage:
SELECT
*
FROM
connection.`folder/filename.csv`
WITH(
format='csv_with_names',
SCHEMA
(
Year int,
Manufacturer String,
Model String,
Price Double
)
);
Where:
connection
: Name of the connection to Object Storage.folder/filename.csv
: Path to the file in the Object Storage bucket.SCHEMA
: Data schema description in the file.