S3 Select query language syntax
Note
To be able to run S3 Select queries, contact support.
The only standard SQL operator used by S3 Select is SELECT
. It supports the following standard ANSI clauses:
- SELECT
- FROM
- WHERE
- LIMIT
Nested queries and connections are not supported.
SELECT
This operator determines the object data returned by a query, such as names of columns, functions, and expressions.
Syntax:
SELECT *
SELECT projection [ AS column_alias | column_alias ] [, ...]
The first form of the clause returns each line that satisfies the condition in the WHERE
clause as is. The second one returns a row with a custom projection of the output scalar expressions for each column.
FROM
FROM
clauses provide data source for SELECT
. As an argument, they take the name of an Object Storage object.
Syntax:
FROM S3Object
FROM S3Object alias
FROM S3Object AS alias
As in standard SQL, a FROM
clause creates rows filtered using a WHERE
clause and returned as a data list specified in SELECT
.
WHERE
This clause filters rows based on a condition, which you specify as a logical expression. The returned result only contains rows for which the expression equals to TRUE
.
Syntax:
WHERE condition
LIMIT
This clause limits the number of records returned by a query.
Syntax:
LIMIT number
Attribute access
SELECT
and WHERE
clauses can refer to a data record using file attributes.
CSV file attributes:
-
Column numbers.
In a query, you can refer to a specific column using the
_N
name, whereN
is the column sequence number in a file.Column numbering starts with 1. For example, if the first column's name is
_1
, the second one's name is_2
.A column name can be specified in both
_N
andalias._N
formats. For example, the_2
andmyAlias._2
names are both valid references to a column inSELECT
andWHERE
clauses. -
Column headers.
As in conventional SQL, expressions in
SELECT
andWHERE
clauses may reference columns by column header, as inalias.column_name
orcolumn_name
.
JSON file attributes:
-
Document.
You can access a JSON file's fields by field name, e.g.,
alias.name
.
Examples
Sample data in JSON format:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","status":404,"request_time":16}
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name2","status":200,"request_time":12}
{"timestamp":"2021-02-26T01:27:20Z","object_key":"name3","status":200,"request_time":6}
Query:
SELECT "timestamp", object_key, request_time FROM S3Object WHERE status >= 400
Result:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","request_time":16}
Query:
SELECT * FROM S3Object WHERE request_time >= 10
Result:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","status":404,"request_time":16}
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name2","status":200,"request_time":12}
Case sensitivity of header and attribute names
To indicate that CSV file column headers or JSON file attributes are case-sensitive, use double quotes. Headers or object attributes without double quotes are not case-sensitive. Therefore, a query error may occur if a field name is ambiguous.
Examples
-
A queried object has a NAME header or attribute.
If there is no indication of case sensitivity, the query successfully returns the object data:
SELECT s.name FROM S3Object s
If you enclose the header or attribute in double quotes, the query will result in a 400 error (
MissingHeaderName
):SELECT s."name" FROM S3Object s
-
A queried object has one header or attribute called NAME and another header or attribute called name.
If there is no indication of case sensitivity, there is ambiguity as to which header or attribute to select. The query returns a 400 error (
AmbiguousFieldName
):SELECT s.name FROM S3Object s
If you enclose the header or attribute in double quotes, the request will successfully return the object data:
SELECT s."NAME" FROM S3Object s
Reserved keywords
S3 Select has a set of reserved keywords. These are required to run SQL expressions when querying object contents. Reserved keywords include function names, data type names, operators, etc.
In some cases, user terms may duplicate a reserved keyword. To avoid conflicts, use double quotes to indicate that you use a certain term intentionally. Otherwise, a 400 syntax error will occur.
Examples
A queried object has a header or attribute called CAST, which is a reserved keyword.
If you enclose a user-defined header or attribute in double quotes, the request will successfully return the object data:
SELECT s."CAST" FROM S3Object s
If you do not enclose a user-defined header or attribute in double quotes, there will be a conflict with the reserved keyword. In which case the query will return a 400 syntax error:
SELECT s.CAST FROM S3Object s
Scalar expressions
WHERE
and SELECT
clauses may contain SQL scalar expressions returning scalar values. These may appear as follows:
-
literal
. SQL literal. A literal is an explicit numeric, character, string, or boolean value (constant) not represented by an ID. -
column_reference
. A reference to a column incolumn_name
oralias.column_name
format used to access a column using column header.Example:
SELECT city.name FROM S3Object city
-
unary_op expression
. In this expression,unary_op
is a unary SQL operator. Unary operators perform operations on a single operand. They include, for instance, the unary minus, which changes the sign of a number.Example:
SELECT -5 FROM S3Object
-
expression binary_op expression
. In this expression,binary_op
is a binary SQL operator. Binary operators perform an operation on two operands. For instance, binary operators include arithmetic, logical, and comparison operators.Examples:
SELECT x FROM S3Object WHERE x=3
SELECT result FROM S3Object WHERE result>=1 AND result<=5
-
func_name
. In this expression,func_name
is the name of a callable scalar function.Example:
SELECT CAST(status AS INT) FROM S3Object
-
expression [ NOT ] BETWEEN expression AND expression
. Checks if a value belongs to a range.Example:
SELECT x FROM S3Object WHERE x BETWEEN -1 AND 1
Aggregate functions
In SELECT
clauses, you can use aggregate functions that are calculated using values of multiple or all rows and return a single resulting value.
The following functions are supported:
Function | Description | Input type | Output type |
---|---|---|---|
COUNT |
Number of rows | Any | INT |
MIN |
Minimum value within a certain set of values | INT or DECIMAL |
Same as input |
MAX |
Maximum value within a certain set of values | INT or DECIMAL |
Same as input |
SUM |
Sum of values | INT , FLOAT , or DECIMAL |
Same as input |
AVG |
Average value | INT , FLOAT , or DECIMAL |
DECIMAL if the input type is INT .Otherwise, same as input. |
Examples:
Sample data:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","status":404,"request_time":16}
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name2","status":200,"request_time":12}
{"timestamp":"2021-02-26T01:27:20Z","object_key":"name3","status":200,"request_time":6}
Query using all aggregate functions:
SELECT
COUNT(*) AS "count",
MIN(request_time) AS "min",
MAX(request_time) AS "max",
SUM(request_time) AS "sum",
AVG(request_time) AS "avg"
FROM S3Object
WHERE status = 200
Result:
{"count": 2, "min": 6, "max": 12, "sum": 18, "avg": 9.0}
Sample data:
timestamp,object_key,status,request_time
2021-02-26T01:27:19Z,name1,404,16
2021-02-26T01:27:19Z,name2,200,12
2021-02-26T01:27:20Z,name3,200,6
Query using all aggregate functions:
SELECT
COUNT(*) AS "count",
MIN(CAST(request_time AS FLOAT)) AS "min",
MAX(CAST(request_time AS FLOAT)) AS "max",
SUM(CAST(request_time AS FLOAT)) AS "sum",
AVG(CAST(request_time AS FLOAT)) AS "avg"
FROM S3Object
WHERE status = '200'
Since all values in the input CSV files are treated as strings, you need to convert them to the appropriate types using the CAST
function.
Result:
count,min,max,sum,avg
2,6,12,18,9.0