Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the best practices to make shiny application run faster?

Tags:

Data:

I have a shiny dashboard application and my dataset is around 600 MB in size. It swells by 100 MB every month. My data resides locally in MySQL.

MenuItems:

I have 6 - 7 sidebar menuItems on my dashboard and each of them has 10 - 12 different outputs - charts and tables. Each of these tabs has 3 - 6 inputs such as selectizeInput, slider, date range, etc. to filter the data.

Data subsets:

Since I cannot load all the data into the memory, for every menu item I create a subset of data based on the date range by keeping the date range to just 2 - 3 days from the system date.

For example:

df1 <- reactive({df[df$date >- dateinput[1] & df$date <- dateinput[2], ]})

The above gets the data for my first menu item and depending on the selectInput or other inputs, I am further filtering the data. For example, If I have a selectInput for Gender (male and female) then I further subset df1 to:

df2 <- reactive({        if(is.null(input$Gender)){            df1         } else if(input$Gender == "Male")            {df1[df1$Gender == "Male",]}         )} 

If I have more than 1 input, I subset this df1 further and pass on the values to df2. df2 becomes the reactive dataset for all the charts and tables in that MenuItem.

The more the number of menuItem I create more subsets to suit the filters and analysis.

I face two problems:

  1. On older machines, the app is not loading. and
  2. On newer machines, the app loads very slowly sometimes 5 - 6 minutes

After the first set of data load, the charts and tables gets rendered faster on reactive changes.

To counter this, I have tried moving all common and repetitive parameters and libraries to global.R.

I have two questions:

1.are there any basic hygiene factors that one needs to keep in mind when mining data in R especially through shiny (Mining in R is extremely fast).

2.I have read about parallel processing, but almost always all the examples talk about distributing a single heavier calculation. Can we distribute through parallel processing, subsetting the data or distributing charts / tables preparation.

Please note, I am a researcher and not a programmer, but have learnt to use shiny and host applications on the cloud or locally recently.

Guidance on this will be very helpful for many novice users of R like me.

like image 842
Apricot Avatar asked Jun 29 '16 06:06

Apricot


People also ask

Is dash faster than Shiny?

Winner: R Shiny. However, for more advanced applications, Dash requires a lot more boilerplate code than Shiny.


Video Answer


1 Answers

This is a very interesting question and deserves more proper responses rather than comments. I would like to relate my experience and thoughts. I built a commercial R+shiny application with Shiny Server Pro, using database(s) and loads of other tricks.

Delayed UI loading time
My app takes over 30s to load, i.e. to give control back to the user.

The issue

Shiny is a single page application. Therefore a complex app, with loads of tabs, data loaded to populate some of the menus & selectors etc. is affected and this starts from the initial loading time.

UI possible mitigations

  • Use dynamic UI components (wisely) to add complexity after start up. For example a particular menu may start very simply with few elements, then add more elements at a later stage.
  • Joe Cheng proposed insertUI and removeUI when my app was almost finished, so I didn't get around to use them, but they also could contribute to a simpler page for start up.

Use of database

My app used MonetDB and later PostgreSQL. The performance of MonetDB was good, but I had a multiple user conflict (complex issue that I cannot detail here) and this forced me to move to PostgreSQL as an alternative. PostgreSQL was fine, but it took a dramatic time to start due to the cache warming up issue. The design required to load at start up loads of data into the DB: bad design.

RDBMS delays possible mitigations

I think I tried most tricks with varying success.

  • Limit RDBMS usage. As I decided from the start to use data.table to speed up data manipulations without been constraints by copying, I was also using fread for any type of csv reading. At the time fwrite (still from data.table) wasn't even on the horizon, otherwise it would merit serious considerations.
  • App re-design. the app architecture has a lot to do with the degree of intensity that RDBMS are used. I'm convinced that time can be saved by a design that could take into account R+shiny (mainly R) limitations.
  • Now MonetDB has R functions embedded into the code, so it should be even faster than before. It certainly deserves a good look. On the other hand the multi-user features should be thoroughly tested: most of R database code does not take into account to be used in a multi-user environment as offered by shiny. Maybe RStudio should be doing something more about this. Honestly they have already started, with the experimental introduction of connection pools and that is great.

Excessive use of reactivity

I think it is great to play with an advanced framework like shiny, and reactivity is a lot of fun to learn. On the other hand over a wide and complex application things can easily get out of hand.

Excessive reactivity possible mitigations

  • Debugging each function gives a precise idea of how many time a particular shiny function is called, and any reactive function is called usually more than once. Of course all this burns cpu time, and needs at least to keep under control.
  • Constructs like observeEvent now have parameters like ignoreInit: a wise use to these parameters can save at least a void cycle at initialisation time.

In my experience we have only scratched the surface of what it is possible to do with shiny. On the other hand there is a limit due to the single process nature of R. With Shiny Server Pro it is possible to envisage to use load balancers and spread multiple users across different servers. On the other hand to get into these territories we would need some kind of messaging system across the various instances. Already know I see the need for that in complex Shiny Server Pro applications (e.g. when there is the need to manage different classes of users, but at the same time to communicate between them). But this is out of scope to this SO question.

like image 71
Enzo Avatar answered Sep 20 '22 07:09

Enzo