Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is BigQuery so slow on non-large data sizes?

Tags:

We have found BigQuery to work great on data sets larger than 100M rows, where the 'initialization time' doesn't really come into effect (or is negligible compared to the rest of the query).

However, on anything under that, the performance is quite slow and poor, which makes it (1) ill-suited to working in an interactive BI tool; and (2) inferior to other products, such as Redshift or even ElasticSearch where the data size is under 100M rows. Actually, we had an engineer at our organization that was evaluating a technology for doing queries on data sizes between 1M and 100M rows for an analytics product that has about 1000 users, and his feedback was that he could not believe how slow BigQuery was.

Without a defense of the BigQuery product, I was wondering if there were any plans on improving:

  1. The speed of BigQuery -- especially its initialization time -- on queries of non-massive data sets?
  2. Will BigQuery ever be able to deliver sub-second response times on 'regular' queries (such as a simple aggregation group by) on datasets under a certain size?
like image 429
David542 Avatar asked Feb 24 '17 01:02

David542


People also ask

Is BigQuery is optimized for high read data?

BigQuery can define a schema and issue queries directly on external data as federated data sources. The BigQuery Storage API offers high-bandwidth parallel reads and is compatible with common processing frameworks like Spark and Pandas.

Why is BigQuery so slow?

Since the demand is high, there are less available slot resources for each job, meaning that the query may take slower than usual to complete.


2 Answers

It's time spent on metadata/initiation, but actual execution time is very small. We have work in progress that will address this, but some of the changes are complicated and will take a while.

You can imagine that in its infancy, BigQuery could have central systems for managing jobs, metadata, etc. in a manner that performed very well for all N0 entities using the service. Once you get to N1 entities, however, it may be necessary to rearchitect some things to make them have as little latency as possible. For notification about new features--which is also where we would announce API improvements related to start-up latency--keep an eye on our release notes, which you can also subscribe to as an RSS feed.

like image 54
Elliott Brossard Avatar answered Sep 28 '22 11:09

Elliott Brossard


After exacts 4 years since this question, we have amazing news to BigQuery users! As stated in this Bi Engine release note from 2021-02-25:

The BI Engine SQL interface expands BI Engine to integrate with other business intelligence (BI) tools such as Looker, Looqbox, Tableau, Power BI, and custom applications to accelerate data exploration and analysis. This page provides an overview of the BI Engine SQL interface, and the expanded capabilities that it brings to this preview version of BI Engine.

I believe this can solve the query latency issue mentioned by David542 question.

like image 29
Murta Avatar answered Sep 28 '22 10:09

Murta