Data Platform Team | June 30, 2022 | 6 min read

Powering Zomato’s data analytics using Trino

Engineers don’t search for things; they raise queries!

Much like how we use Google for everyday things, techies use an internal repository (aka SQL search engine) to discover data that is important to them. At Zomato, different teams have different requirements. For example, the City Lead team might be interested in knowing the number of orders placed in NCR last week, Customer Support might want to know how many queries were solved in a minute, Product Team would like to know the number of customers participating in Zomato Premier League on the day of KKR vs CSK, Social Media might like to know the most-searched dishes on New Year’s Eve, and more. In a nutshell, you get how broad our query types are. To maintain and facilitate such a variety, we need a reliable SQL (Structured Query Language) search engine that can provide better and more relevant results to one and all.

At Zomato, our Data Platform team is responsible for building and maintaining a reliable platform with the core aim of making data an essential part of Zomato’s DNA. To achieve this, our platform should be able to minimise the wait time, i.e., the period between raising a query and getting the results. At our scale, it is a challenge to maintain this without proper infrastructure and efficient data storage.

We currently use Trino as the query engine to power Zomans to build analytical systems to extract actionable insights from our data lake, run a series of experiments, and build ML models. Trino is a huge part of our data stack due to its industry-standard SQL interface and ability to join and aggregate data across heterogeneous data sources like Apache Hive, MySQL, Apache Druid, Apache Pinot, MongoDB, etc.

Zomans consume data using query and visualisation tools such as Apache Superset, Redash, Jupyter notebooks, and our custom python SDK – PyJumbo.

Talk numbers to me – query scale and workloads

We observed that the query workloads can be broadly divided into the following categories:

Scheduled ETLs/ reports: Long ETL(Extract, Transform, Load) queries, with runtime up to 3 hours. Typically, these are materialised views to speed-up dashboard queries run through an internal platform built on top of Apache Airflow with an in-built full query retry mechanism.
Dashboard queries: Short queries, with runtime up to 3 minutes. These are primarily used to build and share dashboards for various business process monitoring.
Adhoc analysis: Medium duration queries, with a maximum runtime of 20 minutes. Used for debugging and other adhoc use cases.

Based on the usage pattern, we can individually tune the cluster by segregating them for performance and robustness.

Ready? Set? Deploy!

We are currently running Trino on Amazon Elastic Container Service (Amazon ECS) using our custom Trino image, wherein we have integrated the following components:

Zomato Trino query-log plugin: A query logging plugin based on the Trino Event listener framework¹. We log all query completed events and use them to create metrics that help us power the golden-signal² Trino dashboard. We regularly analyse queries to tune our hive tables data layout by tuning data file size and using column-level indexing(Bloom filters). We also compute column-level statistics to enable Cost-Based Optimizer³
Trino User Defined Functions (UDFs): Custom UDFs based on the needs of Zomans, like custom geo functions.
Verifier and Benchmark: Before rolling out a new Trino version, we benchmark the new version to measure the performance of queries. In addition to this, we also use Trino Verifier to compare the query results produced by two different clusters.

This is incredibly useful to identify any backward-incompatible changes present in the new version. For instance, Trino version 341⁴ had a backward-incompatible change for the timestamp column. This tool helped us identify and isolate the change before rolling out the new version.

Setting up alerts to solve bottlenecks and debug issues

Since Trino executes queries from various sources by distributing them across different workers, it becomes difficult to identify bottlenecks and debug service issues. Through Trino JMX and docker container metrics, we can create real-time monitoring dashboards to help debug issues with our clusters. Some of the PagerDuty alerts we have:

P95 query execution/queued time of cluster
Number of active workers
Rate of type of query failures (user, internal, external, or insufficient resource)
Number of queued and running queries

Apart from real-time monitoring, we have also set alerts on our Trino query logs to

identify bad queries choking the clusters
regularly look for query resource usage (bytes scan, total stages, total runtime, etc) to fine-tune our cluster configurations

Going Biiiiiiig – Scaling Trino

As Zomato grew, the need for data-driven analysis also increased, and so did our cost for the Trino setup.

After deploying Trino, we soon understood that only time-based autoscaling would not work because of two reasons:

We need to scale our cluster according to the traffic
We want to gracefully shutdown unnecessary extra nodes during the off-peak times

A mould that fits all needs – Custom Autoscaler

We aimed to increase the cluster’s effective utilisation, all while offering a reliable customer experience.

While looking for the auto-scaling solution, it became clear that Trino cannot be scaled based on CPU/Memory metrics like any other service. We experimented with multiple scaling conditions but finally settled on scaling the cluster on the ratio of running queries and active workers. We experimented with queued queries metric in the scaling equation but soon removed it as we realised it was more biased towards upscaling the cluster and was not going well with our resource groups.

Upscale if runningQueries/activeWorkers > THRESHOLD

Downscale if runningQueries/activeWorkers < THRESHOLD

We calculated the value of THRESHOLD based on the maximum number of nodes, maximum concurrent queries, and query max memory per node.

With this custom auto-scaler, we managed our workload more efficiently. In addition, we also reduced the cluster cost by 30-40%, with 0 downtimes during downscaling.

Getting to the next step – Presto Gateway

As the scale increased, we faced frequent cluster slowdowns, even with our custom auto-scaler. We realised that a single cluster will not be enough to scale indefinitely and we need a multi-cluster concept to manage our workloads. Hence, we decided to use a load balancer/ proxy/ gateway for prestoDB/ Trino called presto-gateway⁵ open-sourced by Lyft.

In Lyft’s own words – ‘Presto-Gateway is a stateful load-balancer, proxy and router for multiple presto clusters, it provides transparent access to underlying presto-backend without changing the protocol.’

Since then, we’ve added functionalities on top of it. This service makes visualisation tools/ SDKs agnostic of specific Trino versions and enables the following usages. Some of these features are in active development and we are slowly moving all our visualisation tools/ SDKs from specific clusters to Presto Gateway.

Support for converting X-Presto Headers to X-Trino headers to make it backwards compatible for our custom visualisation tools/ SDKs
Overall Presto cluster’s health visibility
Query cost/thinking estimation routing

With Trino, Zomans say Bingo!

Trino has become an integral part of our day-to-day life. Be it – analytics, reporting, data quality, or lineage tracking. With Trino’s architecture, we can empower Zomans to make data-driven decisions. In the future blogs, we will talk about

How we decided the best instances for our different workloads?
Migration of Trino to Amazon ECS from Amazon Elastic MapReduce (EMR)
Running Trino on AWS Graviton instances

This is a data platform article to share how we scaled Trino to smoothly facilitate our growing number of queries. If you are interested in solving similar problems, connect with Sumanshu Dwivedi on LinkedIn. We’re always looking for cool Data Platform Engineers at Zomato.

This blog was written by Ayush Chauhan and Palash Goel in collaboration with Ankit Atreja, Anmol Virmani, and Rajat Taya under the guidance of Sumanshu Dwivedi and Himanshu Rathore.

––––––

Sources –

Trino – Event Listener,trino.io
Monitoring distributing systems, sre.google
Trino – Cost-Based Optimizer, trino.io
Fix timestamp semantics, github.com
lyft/presto-Gateway, github.com

-x-

_{All images/ videos are designed in-house using}_{excalidraw.com}_and_figma.com_.

_{All content provided in this blog is for informational and educational purposes only. It is not professional advice and should be treated as such. The writer of this blog makes no representations as to the accuracy or completeness of any content or information contained here or found by following any link on this blog.}