Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load large table into tableau for data visualization?

I am able to connect tableau with my database but the table size is really large here. Everytime I try to load the table into tableau, it is crashing and I am not able to find any work around. The table size varies from 10 million - 400 million rows. How should I approach this issue any suggestion ?

like image 371
python Avatar asked Jul 01 '15 17:07

python


People also ask

How do you load big data in Tableau?

Show activity on this post. Use custom sql query in tableau to filter the data source to get the smaller working data. (data filtered at backend will be added advantage) Hide unwanted fields from the data source pane. Publish your data set to tableau server and then connect the tde server extract to tableau.

Can Tableau handle 4 million rows?

Answer. Tableau Desktop and Tableau Server do not have any enforced row or column limits for the amount of data that can be imported.

Can Tableau handle 5 million rows?

Your help much appreciated ! Tableau can easily process millions of rows of data. I recommend using extract. The new .


3 Answers

You don't "load" data into Tableau, you point Tableau at an external data source. Then Tableau sends a query to the external data source requesting only the summary info (aka query results) needed to create the visualization you designed.

So, for an extreme example, if you place CNT(Number of Records) on the Columns shelf, Tableau will send a simple short query to the external database asking it to report the number of records. Something along the lines of "select count(*) from xxx".

So even if there are billions of rows in the external database, Tableau will send a small amount of information to the database (a query) and receive back a small amount of information (the query results) to display. This allows Tableau to be very fast on its end, and performance depends on how fast the external database can respond to the query. Tuning your database depends on all kinds of factors: type and amount of memory and disk, how indices are set up, etc.

So the first step is to make sure that the database can perform as needed, regardless of Tableau.

That's the purist response. Now for a few messy details. It is possible to design a very complex visualization in Tableau that will send a complex query asking for a very large result set. For instance, you can design a dashboard that draws a dot on the map for every row in the database, and then refreshes a large volume of data everytime you wave the mouse over the marks on the map.

If you have millions or billions of data rows, and you want high performance, then don't do that. No user can read 60 million dots anyway, and they certainly don't want to wait for them to be sent over the wire. Instead first plot aggregate values, min, max, sum, avg etc and then drill down into more detail on demand.

As others suggest, you can use a Tableau extract to offload workload and cache data in a form for fast use by Tableau. An extract is similar to an optimized materialized view stored in Tableau. Extracts are very helpful with speeding up Tableau, but if you want high performance, filter and aggregate your extracts to contain only the data and level of detail needed to support your views. If you blindly make an extract of your entire database, you are simply copying all your data from one form of database to another.

like image 51
Alex Blakemore Avatar answered Oct 13 '22 21:10

Alex Blakemore


I found a simple solution for optimising Tableau to work with very large datasets (1 billion+ rows): Google BigQuery, which is essentially a managed data warehouse.

  1. Upload data to BigQuery (you can append multiple files into a single table).
  2. Link that table to Tableau as an external data source

Tableau then sends SQL-like commands to BigQuery whenever a new 'view' is requested. The queries are processed quickly on Google's computing hardware, which then sends a small amount of information back to Tableau.

This method allowed me visualise a 100gb mobile call record dataset with ~1 billion rows on a MacBook.

like image 29
Matt Doyle Avatar answered Oct 13 '22 21:10

Matt Doyle


There are two ways to interpret this question:

  1. The data source (which might be a single table, a view, etc.) has 10M to 400M rows and Tableau is crashing at some point in the load process. In that case, I suggest you to contact Tableau tech support. They really like to hear about situations like that and help people through them.

  2. You are trying to create a visualization (such as a text table or crosstab) that has N records resulting in 10M to 400M displayed rows. In that case, you're into a territory that Tableau isn't designed for. A text table with 10M rows is not going to be useful for much of anything than exporting to something else, and in that case, there are better tools than Tableau for doing that (such as export/import tools built into most databases).

like image 35
Jonathan Avatar answered Oct 13 '22 20:10

Jonathan