The Trino CLI
Overview
The web-based shell service and the terminals of the Jupyter Notebook service are automatically connected to the Trino service and include a copy of the native Trino command-line interface (CLI) — $PATH
) to simplify execution from any directory.
To facilitate using Trino with the Iguazio Trino connector to query NoSQL tables in the platform's data containers, the environment path also contains a v3io
catalog, the Trino user's username and password (platform access key), and the Trino Java TrustStore file and password.
For detailed information about Trino and its CLI, refer to the Trino documentation.
You start the Trino CLI by running either
trino
You can stop the CLI, at any time, by running the
CLI Options
When starting the CLI, you can specify any supported native Trino CLI option.
Use the --help
option to see a full list.
The following options are especially relevant when using the CLI in the platform:
- --server
Sets the location of the Trino server. The
trino wrapper already preconfigures the server location for your platform cluster. However, when runningtrino-cli , you must set--server to the location of the Trino server in your cluster. The Trino server URL is the API URL of the predefined Trino service (Trino
), which you can copy from theServices page of the platform dashboard. The following command demonstrates setting the Trino server URL tohttps://trino-api-trino.default-tenant.app.mycluster.iguazio.com
:trino-cli --server https://trino-api-trino.default-tenant.app.mycluster.iguazio.com
- --catalog
Sets the default Trino-connector catalog. If you don't configure a default catalog, you need to specify the catalog in the
FROM
string of each Trino command; for table commands, the catalog is specified at the start of the table path. (You can override the default configuration by specifying another catalog in specific Trino commands.)To use the Iguazio Trino connector to query platform NoSQL tables, you need to use the
v3io
catalog (see The v3io Catalog). Thetrino wrapper already preconfigures the catalog tov3io
. When runningtrino-cli , you can optionally use the option to set the default Trino-connector catalog to--catalog v3io
. For example:trino-cli --catalog v3io --server https://trino-api-trino.default-tenant.app.mycluster.iguazio.com
- --schema
Sets the default Trino schema. In the Iguazio AI Platform, the Trino schema is the name of the data container that contains the queried tables. If you don't configure a default container, you need to include the container name as part of the table path in each Trino command. (You can override the default configuration by specifying another container name in specific Trino commands.) This is true for both
trino-cli andtrino . For example, the following command configures the CLI to query tables in the "projects" container usingtrino :trino --schema projects
And this is an example of a similar command using the native Trino CLI (
trino-cli ):trino-cli --schema projects --catalog v3io --server https://trino-api-trino.default-tenant.app.mycluster.iguazio.com
The v3io Catalog
To configure Trino to work with the Iguazio Trino connector for querying data in the platform's NoSQL store, you need to use the connector's custom v3io
Trino catalog.
The v3io
as the default catalog.
For example:
trino-cli --catalog v3io --server https://trino-api-trino.default-tenant.app.mycluster.iguazio.com
If you don't configure the v3io
catalog when starting the CLI (either by using FROM
string of each Trino command; for commands that reference platform NoSQL tables, the table path must begin with v3io
(see Table Paths for details).
The following example queries a NoSQL "mytable" table in a "mycontainer" data container:
SELECT * from v3io.mycontainer.mytable;
Supported Commands
After starting the Trino CLI, you can run supported commands for your selected catalog from the Trino command line.
Version 3.6.1 of the Iguazio Trino connector's v3io
catalog supports the Trino
-
SHOW TABLES returns only tables that reside in the container's root directory, provided the access key includes data-access permissions for this directory. -
To use the view commands (
CREATE VIEW ,DROP VIEW , andSHOW CREATE VIEW ), you first need to enable Hive for the Trino service. See Enabling Hive. You can then save views of platform NoSQL tables, as well as other supported file types, to the default schema of the Hive trino connector (hive.default
).
Table Paths
When using the Iguazio Trino connector, you can specify table paths in one of two ways:
-
Table name — this is the standard Trino syntax and is currently supported only for tables that reside directly in the root directory of the configured data container (Trino schema).
- When using built-in Trino commands, such as
SELECT , you specify the path asv3io.<container name>.<table name>
. For example,SELECT * FROM v3io.mycontainer.mytable;
. - When using the custom
v3io.schema.infer command, you pass the container and table names as separate parameters —v3io.schema.infer('<container name>', '<table name>');
. For example,call v3io.schema.infer ('mycontainer', 'mytable');
.
- When using built-in Trino commands, such as
-
File path — the relative path to the table within the configured data container (
/path/to/table
). Currently, nested tables in the platform's data containers must be referenced using this syntax.- When using built-in Trino commands, such as
SELECT , you specify the path asv3io.<container name>."/path/to/table"
. For example,SELECT * FROM v3io.mycontainer."/mytables/cars/vendors";
. Note that the table path must be embedded within double quotes. - When using the custom
v3io.schema.infer command, you pass the container name and table path as separate parameters —v3io.schema.infer('<container name>', '/path/to/table');
. For example,call v3io.schema.infer ('mycontainer', '/mytables/cards/vendors');
.
- When using built-in Trino commands, such as
-
For both syntax variations, in standard Trino commands you can optionally omit the catalog and container (schema) names if they're already preconfigured; see the CLI
--catalog and--schema options. Thetrino wrapper preconfigures thev3io
catalog. -
Tables in a data container's root directory can be accessed by using either the table-name or file-path syntax. The table-name syntax is simpler but slower. Therefore, it's recommended that you use the path syntax when you need to frequently repeat a specific query.
-
- The table-name syntax (which is supported for tables in the root container directory) ignores the letter case in the table path. Therefore, it also supports uppercase letters in the path; (note that the table names will appear in lowercase letters in query results).
- The file-path syntax doesn't currently support uppercase letters in the table path.
Defining the NoSQL Table Schema
Trino handles structured data.
Therefore, it needs to be aware of the schema of the data structure.
(Don't confuse this with native Trino schemas, which are used for organizing tables — as explained, for example, for the
The v3io.schema.infer Command
The Iguazio Trino connector exposes a <container name>
is the name of the data container (schema) that contains the table and <relative table path>
is the relative path to the table within the container (see Table Paths):
call v3io.schema.infer('<container name>', '<relative table path>');
When the table resides in the container's root directory, the relative path can be the table name. For example, the following command infers the schema of a "mytable" table in the root directory of a "mycontainer" data container:
call v3io.schema.infer('mycontainer', 'mytable');
For nested tables, you need to specify the table path as '/path/to/table'
.
For example, the following command infers the schema of a "mytable" table in a
call v3io.schema.infer('mycontainer', '/mydata/mytable');
The infer-schema command creates a JSON schema file (
Partitioned Tables
Table partitioning is a common technique for optimizing physical data layout and related queries. In a partitioned table, some item attributes (columns) are used to create partition directories within the root table directory using the format
The Iguazio Trino connector supports querying of partitioned NoSQL tables:
a partitioned table is queried like any other table, with the table path set to the root table directory and not to a specific partition directory.
When processing queries, the platform searches the root table directory that is specified in the read command for nested directories of the format SELECT * FROM v3io.mycontainer.mytable WHERE month = 12;
query will return only the items from the
Read Optimization
The Iguazio Trino connector supports the following optimized table queries (reads), which are more efficient compared to the standard full table scan:
For more information about these query types, see NoSQL read optimization.
Faster Item-Specific Queries
The fastest Trino NoSQL table queries are those that uniquely identify a specific item by its primary-key value. See NoSQL faster item-specific queries .
=
) or IN (IN
) operator to the sharding-key attribute and optionally also apply one of these operators the sorting-key attribute (in the case of a compound primary key).Faster Item-Specific Query Examples
The following commands all identify a specific item by its primary-key value and will be processed more quickly than table-scan processing; (it is assumed that v3io
is configured as the default catalog):
-
Retrieve an item with the simple primary-key value "345":
SELECT * FROM mycontainer.mytable" WHERE id = 345;
-
Retrieve an item with the compound primary-key value "myfile.txt":
SELECT * FROM mycontainer.mytable" WHERE basename = 'myfile' and suffix = 'txt';
Range Scans
A Trino NoSQL table query that uses supported sharding-key and optional sorting-key filters to retrieve items with the same sharding-key value, is processed by performing a range scan, which is more efficient than the standard full table scan. See NoSQL range scans.
=
) or IN (IN
) operator to the sharding-key attribute, and optionally also apply a comparison operator (=
/>
/>=
/<
/<=
/ BETWEEN
) to the sorting-key attribute.Range-Scan Query Examples
The following commands query a "rides" table in a <sharding key>.<sorting key>
primary key, a v3io
is configured as the default catalog):
-
Retrieve all items with a
driver_id sharding-key attribute value of1
(regardless of the sorting-key value):SELECT * FROM mycontainer."/mytaxis/rides/" WHERE driver_id = 1;
Output
date | avg_ride_km | avg_ride_passengers | driver_id | total_km | num_rides | total_passengers ----------+--------------------+---------------------+-----------+----------+-----------+------------------ 20180601 | 5.0 | 1.6 | 1 | 125.0 | 25 | 40 20180602 | 5.3 | 2.3 | 1 | 106.0 | 20 | 46 20180701 | 3.8000000000000003 | 1.5 | 1 | 106.4 | 28 | 42 (3 rows)
-
Retrieve all items with a
driver_id sharding-key attribute value of24
and adate sorting-key attribute value within the first six months of 2018:SELECT * FROM mycontainer."/mytaxis/rides/" WHERE driver_id = 24 AND date >= '20180101' AND date < '20180701';
Output
date | avg_ride_km | avg_ride_passengers | driver_id | total_km | num_rides | total_passengers ----------+-------------+---------------------+-----------+----------+-----------+------------------ 20180602 | 52.0 | 2.2 | 24 | 260.0 | 5 | 11 20180601 | 41.5 | 2.25 | 24 | 332.0 | 8 | 18 (2 rows)
-
Retrieve all items with a
driver_id sharding-key attribute value of1
,16
, or24
(regardless of the sorting-key value) and anavg_ride_passengers attribute value that is greater or equal to 3:SELECT * FROM mycontainer."/mytaxis/rides/" WHERE driver_id IN (1, 16, 24) AND avg_ride_passengers >= 3;
Output
date | avg_ride_km | avg_ride_passengers | driver_id | total_km | num_rides | total_passengers ----------+--------------------+---------------------+-----------+----------+-----------+------------------ 20180701 | 32.199999999999996 | 4.0 | 16 | 193.2 | 6 | 24 20180601 | 224.2 | 8.0 | 16 | 224.2 | 1 | 8 20180602 | 24.4 | 4.5 | 16 | 244.0 | 10 | 45 20180701 | 50.300000000000004 | 3.0 | 24 | 352.1 | 7 | 21 (4 rows)
v3io Query Examples
- The examples in this section assume that
v3io
has been configured as the default catalog. If this isn't the case, add "FROM v3io
" inSHOW SCHEMAS commands and "v3io.
" at the start of the table paths (before the name of the data container) inSHOW TABLES andSELECT commands. For example, replace "projects.mytable
" with "v3io.projects.mytable
"; see Table Paths for details. - See also the separate range-scan query examples.
The following command lists all the data containers (schemas) in the parent platform tenant whose names end in "data":
SHOW SCHEMAS LIKE '%data';
The following command lists all the tables in the root directory of the "projects" data container.
(Remember that the
SHOW TABLES IN projects;
The following command shows the contents of a "mytable" table in the "projects" data container:
SELECT * FROM projects.mytable;
The following command shows the contents of a nested "tests/nosql/table1" table in the "projects" container; (see the file-path syntax):
SELECT * FROM projects."/tests/nosql/table1";
If you configured the default Trino schema when starting the CLI to "projects", you can optionally run the commands from the previous examples without explicitly specifying the name of the data container:
SHOW TABLES;
SELECT * FROM mytable;
SELECT * FROM "/tests/nosql/table1";
v3io Query Examples Using Views
The following commands demonstrate how to create and use a query view for a NoSQL table.
Note that because the view is saved to the default
Hive schema, before you create the view you need to ensure that Hive is enabled for the platform's Trino service:
iguazio
in the following commands with your platform username.-
Create an
iguazio_stocks_tab_etc_view
view of aSELECT
query for all items with thesecuritytype attribute value"ETC"
in ausers/<username>/examples/stocks_tab table for user "iguazio":CREATE VIEW hive.default.iguazio_stocks_tab_etc_view AS SELECT * FROM users."/iguazio/examples/stocks_tab" WHERE securitytype = 'ETC';
-
Show the view's SQL statement:
SHOW CREATE VIEW hive.default.iguazio_stocks_tab_etc_view;
-
Use the view to return all items in the table:
SELECT * FROM hive.default.iguazio_stocks_tab_etc_view;
You can also optionally apply an additional filter to the view query. For example, the following query returns all items with the
securitytype attribute value"ETC"
(view query) and anumberoftrades attribute value that's greater than1
:SELECT * FROM hive.default.iguazio_stocks_tab_etc_view WHERE numberoftrades > 1;