The Trino Service (formerly Presto)
Trino is an open-source distributed SQL query engine for running interactive analytic queries. The platform has a pre-deployed tenant-wide Trino service that can be used to run SQL queries and perform high-performance low-latency interactive data analytics. You can ingest data into the platform using your preferred method — such as using Spark, the NoSQL Web API, a Nuclio function, or V3IO Frames — and use Trino to analyze the data interactively with the aid of your preferred visualization tool. Running Trino over the platform's data services allows you to filter data as close as possible to the source.
You can deploy multiple Trino services in the same cluster. This is useful for segregating workloads and avoiding resource contentions. For example, you can run critical queries on one service and non-critical queries on another.
You can run SQL commands that use ANSI SQL
The Iguazio Trino connector enables you to use Trino to run queries on data in the platform's NoSQL store — including support for partitioning, predicate pushdown, and column pruning, which enables users to optimize their queries.
You can also use Trino's built-in Hive connector to query data of the supported file types, such as Parquet or ORC, or to save table-query views to the default Hive schema. Note that to use the Hive connector, you first need to create a Hive Metastore by enabling Hive for the platform's Trino service. For more information, see Using the Hive Connector in the Trino overview.
The platform also has a built-in process that uses Trino SQL to create a Hive view that monitors both real-time data in the platform's NoSQL store and historical data in Parquet or ORC tables
For more information about using Trino in the platform, see the Trino Reference. See also the Trino and Hive restrictions in the Software Specifications and Restrictions documentation.
Configuring the Service
Node Selection
With node selection, Kubernetes only schedules the pod on nodes that have each of the labels you specify. See Node Selector.
Host Path Volumes
You can create host path volumes for use with Spill to Disk.
- In the
Custom Parameters tab, underCreate host path volumes type in the:Host Path : an existing path on the app node with rwx permissionsContainer Path : Path of the designated volume in the Trino worker pod (that will be mounted in the container). If used for Spill to Disk, this must be the parent folder of thespiller-spill-path
.
- Repeat for additional volumes.
- Save the service.
Spill to Disk
The platform supports the Trino Spill to Disk feature: during memory intensive operations, Trino allows offloading intermediate operation results to disk(s). The goal is to enable execution of queries whose memory requirements exceed the per query or per node limits.
To configure Spill to Disk:
- Create a host path volume, if it doesn't already exist. The iguazio user must have rw access to the
Host Path . - In the Trino service configuration,
Custom Parameters tab, pressWorkers and add these two parameters to theconfig.properties
:spiller-spill-path
= the path, or paths, to the designated disks in the Trino worker pod. When using multiple spill paths, write a comma-separated list of paths.spill-enabled
=true
- Trino creates the leaf folder that's written in the
spiller-spill-path
property of theconfig.properties
file. - The container path of the host path volume must be the parent of the
spiller-spill-path
.
For more details on the Trino configuration, see Spilling properties.
Deploying Multiple Trino Services in the Same Cluster
Create the Trino as usual in the
Then specify the Trino service from the Jupyter service and the Web-shell service.
If you enable Hive for the service, the resulting Hive/MaridDB services are named: