Using Apache Trino

Apache Trino lets you run adhoc and ETL queries over multiple big data file systems. By default, Trino is installed in the first master node of the cluster and has system, tpch, and hive connectors installed.

To run the CRUD operation in the Hive metastore, you must add the HDFS policy with resource path as /tmp,/warehouse and permission as R/W/E for the Trino user.

Performance Tuning

Trino is a distributed SQL query engine that's designed to handle large volumes of data across many data sources, including object storage in cloud environments. However, the performance of Trino can be affected by various factors, such as the size and complexity of the data, the available network bandwidth, the cluster configuration, and the query patterns. Therefore, it's important to perform performance tuning to optimize the performance and scalability of Trino for customer specific use cases.