I have a presto cluster configured with 12 workers that is being queried by Java applications. The cluster is capable of performing 30 concurrent requests (if there are more, they are queued).
The applications might send around 80-100 distinct queries, which I expect to be handled by cluster.
Problem: When queries are performed sequentially they complete significantly faster than when they are performed in parallel.
For instance, if I run 100 queries sequentially each of them takes 1-12 seconds to complete and they all are completed in around 2 minutes. But if I start all of them in parallel it takes around 8-12 minutes to complete them all. At corner cases it takes up to 30 minutes.
If I look on the presto console I see that most of the queries are blocked and only 1-3 are in fact in Running state.
Unfortunately I can't post any of the queries. They usually access different schemas (up to 6 in one query), they are full of joins and nested queries. At the same time most of them are written following presto best practices.
Question: How can I improve performance? At least what areas should I investigate to find out the root cause?
Here are some metrics for one of the slowest queries (may be the numbers will say something to you).
Resource Utilization Summary
CPU Time 8.42m
Scheduled Time 26.04m
Blocked Time 4.77d
Input Rows 298M
Input Data 9.94GB
Raw Input Rows 323M
Raw Input Data 4.34GB
Peak Memory 10.18GB
Memory Pool reserved
Cumulative Memory 181G seconds
Timeline
Parallelism 477
Scheduled Time/s 1.47K
Input Rows/s 281K
Input Bytes/s 9.60MB
Memory Utilization 0B
Presto – extremely fast (168–177 BPM)
Efficiency and speed are important for query performance so Presto has a number of design features to maximize speed such as in-memory pipelined execution, a distributed scale-out architecture, and massively parallel processing (MPP) design.
It seems like I figured out the issue myself.
Presto is a distributed SQL query engine. And the key word here is distributed. It guarantees that if you run a query it is efficiently distributed among workers and performed with high speed.
Performing parallel queries and expecting that Presto will figure out how to efficiently parallel them is most likely a misuse. It is more like relational database approach which unfortunately doesn't work in Presto.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With