Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache SuperSet is very slow

Any recommendation on how to make superset faster?

Cache seems to load full data from the cache, I thought it load only old data from the cache, and real-time data from the database, isn't it like this?

What about some parallel processing?

like image 399
Brachi Avatar asked Jul 16 '19 08:07

Brachi


1 Answers

This answer is valid as of Superset 0.37.0.

At the moment, dashboard performance is affected by a few different factors. I'll enumerate them below along with methods to improve performance:

  1. Database concurrency limits can have an impact on dashboard performance. Dashboards load their information in parallel via concurrent web requests. Make sure that the database user provided allows enough concurrency that queries aren't being queued at the database layer.
  2. Cache performance your caching layer should be able to return multiple results, if not in parallel, extremely quickly. We've had success leveraging S3 for our cache.
  3. Cache hit percentage Superset will hit the cache only for queries that exactly match one that has been run recently. Otherwise the full query will fall through to the underlying analytical DB (Druid in this case). You can reduce the query load on Druid by using a less granular resolution on your dashboard - if it's possible to have it update less frequently, say a couple of times a day rather than in real-time, this can hit cache for all requests other than the first request in the new period under consideration.
  4. Python Web Process Concurrency Limits make sure that your web application server can handle enough parallel requests. The browser will request multiple charts' data at the same time, and the system will need to be able to handle these requests in parallel.
  5. Chart Query Performance As data is frequently requested, especially for real-time data from a database like Druid, optimizing the queries run by the charts can be very useful. I'd take a look at any virtual datasources that are being leveraged to see if they can be materialized or made more efficient.
  6. Web browser concurrent request limits By default most web browsers limit the number of concurrent requests that can be made to the same FQDN. If you have more than 6 charts on the same dashboard, it can be helpful to balance requests across multiple FQDNs running Superset to get around this browser limitation. There's more information on the approach to that in the issue history on Github, but Superset does support this type of configuration.

The community is very interested in improving performance over time, and as such there have been recommendations to move all analytical queries to Celery as well as making other architectural changes to improve performance. I hope this description helps and that something in here will help you track down the issue!

like image 72
Will Barrett Avatar answered Oct 29 '22 15:10

Will Barrett