Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculations in R, visualisations in Tableau - how to properly integrate these two?

I've got a bunch of R scripts that read in a data from various sources, perform all the analyses needed and makes some visualisations. Now I need to pass the visualisation part over to Tableau and prepare the whole dashboard in the latter.

From what I know I could save all the data.frames to .Rdata files and load them in Tableau, but it seems to be cumbersone in the long run, while the whole program should be automated as far as possible.

The second option I got to know is to read data in Tableau, connect to R via Rserve and call simple R scripts from Tableau. It would be an option, but:

  • R scripts can't be fully automated - they require some interaction with the user, it is the user who has to control the analysis and it should be left as that
  • my scripts do so much data cleaning and wrangling that I think it's better to leave this job separately in R

So my question is what is the best way to integrate R and Tableau so that after I perform all the analysis in R I could only update the data in prepared earlier Tableau dashboard with few plots?

like image 697
jakes Avatar asked Nov 08 '18 12:11

jakes


People also ask

Can we integrate R with Tableau?

Tableau can now connect to R through calculated fields and take advantage of R functions, libraries, packages and even saved models. These calculations dynamically invoke the R engine and pass values to R via the Rserve package, and are returned back to Tableau.

What is R integration in Tableau?

The R integration is a powerful feature that enables Tableau users to harness the powers of statistical modeling from inside Tableau, but it has a few limitations. Only the 4 scripting functions can be used to interact with the R server and visualizations built-in R cannot be brought in via calculated fields.


1 Answers

There are two major approaches to using Tableau and R together that fit different use cases. The first approach is to use R upstream to prepare and analyze data that is then made available to Tableau to use for producing and sharing visualizations. The second approach is to have Tableau interactively connect to R via RServe. In that case, Tableau is using R to effectively call functions and return values which Tableau can then incorporate into visualizations. A good example would be having Tableau send data values to R which can then be scored by a previously trained classifier model. R provides the model assessment; Tableau provides the interactive visualization.

Your case seems to fall in the first category - using R upstream to produce data that is fed to Tableau. In that case, you can save the data in any format (tabular) format that Tableau can connect to: csv, Rdata, database tables, xlsx, shp. Your choice.

You can also look into converting the generated data into Tableau data extract format (was .tde, now .hyper) This will speed analysis and can simplify publishing the data. Tableau has APIs to allow Python (and Java, C etc) to produce extracts, and also has a command line utility for Windows to produce extracts.

If your goal is to automate the ETL process, you should look into using the Tableau Prep tool - which can convert your Rdata file to an extract without requiring you to write a script. The final topic to consider is making your data available for people and workbooks using Tableau Server. When you're ready for that step, you have several options about how to publish and refresh your data on the Tableau Server. Either by having the Server initiate a refresh on a schedule, or by initiating a refresh after your R script executes. There is another API that is useful in that last case, known as the Tableau REST API, and Tableau has also published an open source library to github that makes using the REST API from Python easy. Tableau has announced plans to have Tableau Server execute Table Prep flows in 2019.

The interactive use case with RServe is great in many cases, but doesn't seem to be what you are looking for. One note: that feature is designed for interactive use. If you have a long running computation in R, say to train a complex ML model, you would usually be better off doing that as a separate step and bringing Tableau into the mix when you want to build and share interactive visualizations.

like image 197
Alex Blakemore Avatar answered Oct 05 '22 08:10

Alex Blakemore