S3 Select query language syntax
Note
To be able to make S3 Select requests, contact support
The only standard SQL operator used by S3 Select is SELECT. It supports the following standard ANSI clauses:
- SELECT
- FROM
- WHERE
- LIMIT
S3 Select does not support nested queries and joins.
SELECT
This operator determines the object data returned by a query, such as names of columns, functions, and expressions.
Syntax:
SELECT *
SELECT projection [ AS column_alias | column_alias ] [, ...]
The first form of the clause returns each row that satisfies the condition in the WHERE clause as is. The second one returns a row with a user-defined projection of the output scalar expressions for each column.
FROM
FROM clauses provide data for SELECT. As an argument, they take the name of an Object Storage object.
Syntax:
FROM S3Object
FROM S3Object alias
FROM S3Object AS alias
As in standard SQL, the FROM clause creates rows that are filtered in the WHERE clause and projected in the SELECT list.
WHERE
This clause filters rows based on a condition, which you specify as a logical expression. The returned result only contains rows for which the expression equals to TRUE.
Syntax:
WHERE condition
LIMIT
This clause limits the number of records returned by a query.
Syntax:
LIMIT number
Attribute access
SELECT and WHERE clauses can refer to record data by using file attributes.
CSV file attributes:
-
Column numbers.
In a query, you can refer to a specific column using the
_Nname, whereNis the column position in a file.The position count starts at 1. For example, if the first column's name is
_1, the second one's name is_2.You can refer to a column as
_Noralias._N. For example, the_2andmyAlias._2names are both valid references to a column in theSELECTandWHEREclauses. -
Column headers.
As in conventional SQL, expressions in
SELECTandWHEREclauses may refer to columns by the column header, i.e.,alias.column_nameorcolumn_name.
JSON file attributes:
-
Document.
You can access JSON file fields by the field name, i.e.,
alias.name.
Examples
Sample data in JSON format:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","status":404,"request_time":16}
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name2","status":200,"request_time":12}
{"timestamp":"2021-02-26T01:27:20Z","object_key":"name3","status":200,"request_time":6}
Query:
SELECT "timestamp", object_key, request_time FROM S3Object WHERE status >= 400
Result:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","request_time":16}
Query:
SELECT * FROM S3Object WHERE request_time >= 10
Result:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","status":404,"request_time":16}
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name2","status":200,"request_time":12}
Case sensitivity of header and attribute names
To indicate that CSV file column headers or JSON file attributes are case-sensitive, use double quotes. Headers or object attributes without double quotes are not case-sensitive. Therefore, a query error may occur if a field name is ambiguous.
Examples
-
A queried object has the NAME header or attribute.
If there is no indication of case sensitivity, the query successfully returns the object data:
SELECT s.name FROM S3Object sIf you enclose the header or attribute in double quotes, the query will result in a 400 error (
MissingHeaderName):SELECT s."name" FROM S3Object s -
A queried object has one header or attribute called NAME and another header or attribute called name.
If there is no indication of case sensitivity, there is ambiguity as to which header or attribute to select. The query returns a 400 error (
AmbiguousFieldName):SELECT s.name FROM S3Object sIf you enclose the header or attribute in double quotes, the query will successfully return the object data:
SELECT s."NAME" FROM S3Object s
Reserved keywords
S3 Select has a set of reserved keywords that are required to run SQL expressions when querying object contents. Reserved keywords include function names, data type names, operators, etc.
In some cases, user-defined terms may clash with a reserved keyword. To avoid conflicts, use double quotes to indicate that you use a certain term intentionally. Otherwise, a 400 syntax error will occur.
Examples
A queried object has a header or attribute called CAST, which is a reserved keyword.
If you enclose a user-defined header or attribute in double quotes, the query will successfully return the object data:
SELECT s."CAST" FROM S3Object s
If you do not enclose a user-defined header or attribute in double quotes, there will be a conflict with the reserved keyword. In which case the query will return a 400 syntax error:
SELECT s.CAST FROM S3Object s
Scalar expressions
WHERE and SELECT clauses may contain SQL scalar expressions returning scalar values. These may appear as follows:
-
literal. SQL literal, which is an explicit numeric, character, string, or Boolean value (constant) not represented by an ID. -
column_reference. Reference to a column incolumn_nameoralias.column_nameformat used to access the column using the column header.Example:
SELECT city.name FROM S3Object city -
unary_op expression. In this expression,unary_opis a unary SQL operator. Unary operators perform operations on a single operand. They include, e.g., the unary minus that changes the sign of a number.Here is an example:
SELECT -5 FROM S3Object -
expression binary_op expression. In this expression,binary_opis a binary SQL operator. Binary operators perform an operation on two operands. Binary operators include, e.g., arithmetic, logical, and comparison operators.Examples:
SELECT x FROM S3Object WHERE x=3SELECT result FROM S3Object WHERE result>=1 AND result<=5 -
func_name. In this expression,func_nameis the name of a callable scalar function.Here is an example:
SELECT CAST(status AS INT) FROM S3Object -
expression [ NOT ] BETWEEN expression AND expression. Checks if a value belongs to a range.Here is an example:
SELECT x FROM S3Object WHERE x BETWEEN -1 AND 1
Aggregate functions
In SELECT clauses, you can use aggregate functions that are calculated using values of multiple or all rows and return a single resulting value.
The following functions are supported:
| Function | Description | Input type | Output type |
|---|---|---|---|
COUNT |
Number of rows | Any | INT |
MIN |
Minimum value within a certain set of values | INT or DECIMAL |
Same as input |
MAX |
Maximum value within a certain set of values | INT or DECIMAL |
Same as input |
SUM |
Sum of values | INT, FLOAT, or DECIMAL |
Same as input |
AVG |
Average value | INT, FLOAT, or DECIMAL |
DECIMAL if the input type is INT;otherwise, same as input |
Examples:
Sample data:
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name1","status":404,"request_time":16}
{"timestamp":"2021-02-26T01:27:19Z","object_key":"name2","status":200,"request_time":12}
{"timestamp":"2021-02-26T01:27:20Z","object_key":"name3","status":200,"request_time":6}
Query using all aggregate functions:
SELECT
COUNT(*) AS "count",
MIN(request_time) AS "min",
MAX(request_time) AS "max",
SUM(request_time) AS "sum",
AVG(request_time) AS "avg"
FROM S3Object
WHERE status = 200
Result:
{"count": 2, "min": 6, "max": 12, "sum": 18, "avg": 9.0}
Sample data:
timestamp,object_key,status,request_time
2021-02-26T01:27:19Z,name1,404,16
2021-02-26T01:27:19Z,name2,200,12
2021-02-26T01:27:20Z,name3,200,6
Query using all aggregate functions:
SELECT
COUNT(*) AS "count",
MIN(CAST(request_time AS FLOAT)) AS "min",
MAX(CAST(request_time AS FLOAT)) AS "max",
SUM(CAST(request_time AS FLOAT)) AS "sum",
AVG(CAST(request_time AS FLOAT)) AS "avg"
FROM S3Object
WHERE status = '200'
Since all values in the input CSV files are treated as strings, you need to convert them to the appropriate types using the CAST function.
Result:
count,min,max,sum,avg
2,6,12,18,9.0