Data:
I have a shiny dashboard application and my dataset is around 600 MB in size. It swells by 100 MB every month. My data resides locally in MySQL.
MenuItems:
I have 6 - 7 sidebar menuItems on my dashboard and each of them has 10 - 12 different outputs - charts and tables. Each of these tabs has 3 - 6 inputs such as selectizeInput, slider, date range, etc. to filter the data.
Data subsets:
Since I cannot load all the data into the memory, for every menu item I create a subset of data based on the date range by keeping the date range to just 2 - 3 days from the system date.
For example:
df1 <- reactive({df[df$date >- dateinput[1] & df$date <- dateinput[2], ]})
The above gets the data for my first menu item and depending on the selectInput or other inputs, I am further filtering the data. For example, If I have a selectInput for Gender (male and female)
then I further subset df1
to:
df2 <- reactive({ if(is.null(input$Gender)){ df1 } else if(input$Gender == "Male") {df1[df1$Gender == "Male",]} )}
If I have more than 1 input, I subset this df1 further and pass on the values to df2. df2 becomes the reactive dataset for all the charts and tables in that MenuItem.
The more the number of menuItem I create more subsets to suit the filters and analysis.
I face two problems:
After the first set of data load, the charts and tables gets rendered faster on reactive changes.
To counter this, I have tried moving all common and repetitive parameters and libraries to global.R.
I have two questions:
1.are there any basic hygiene factors that one needs to keep in mind when mining data in R especially through shiny (Mining in R is extremely fast).
2.I have read about parallel processing, but almost always all the examples talk about distributing a single heavier calculation. Can we distribute through parallel processing, subsetting the data or distributing charts / tables preparation.
Please note, I am a researcher and not a programmer, but have learnt to use shiny and host applications on the cloud or locally recently.
Guidance on this will be very helpful for many novice users of R like me.
Winner: R Shiny. However, for more advanced applications, Dash requires a lot more boilerplate code than Shiny.
This is a very interesting question and deserves more proper responses rather than comments. I would like to relate my experience and thoughts. I built a commercial R+shiny
application with Shiny Server Pro, using database(s) and loads of other tricks.
Delayed UI loading time
My app takes over 30s to load, i.e. to give control back to the user.
The issue
Shiny
is a single page application. Therefore a complex app, with loads of tabs, data loaded to populate some of the menus & selectors etc. is affected and this starts from the initial loading time.
UI possible mitigations
insertUI
and removeUI
when my app was almost finished, so I didn't get around to use them, but they also could contribute to a simpler page for start up.Use of database
My app used MonetDB
and later PostgreSQL
. The performance of MonetDB
was good, but I had a multiple user conflict (complex issue that I cannot detail here) and this forced me to move to PostgreSQL
as an alternative. PostgreSQL
was fine, but it took a dramatic time to start due to the cache warming up issue. The design required to load at start up loads of data into the DB: bad design.
RDBMS delays possible mitigations
I think I tried most tricks with varying success.
data.table
to speed up data manipulations without been constraints by copying, I was also using fread
for any type of csv reading. At the time fwrite
(still from data.table
) wasn't even on the horizon, otherwise it would merit serious considerations.R+shiny
(mainly R
) limitations.MonetDB
has R
functions embedded into the code, so it should be even faster than before. It certainly deserves a good look. On the other hand the multi-user features should be thoroughly tested: most of R
database code does not take into account to be used in a multi-user environment as offered by shiny
. Maybe RStudio
should be doing something more about this. Honestly they have already started, with the experimental introduction of connection pools
and that is great.Excessive use of reactivity
I think it is great to play with an advanced framework like shiny
, and reactivity is a lot of fun to learn. On the other hand over a wide and complex application things can easily get out of hand.
Excessive reactivity possible mitigations
shiny
function is called, and any reactive function is called usually more than once. Of course all this burns cpu time, and needs at least to keep under control.observeEvent
now have parameters like ignoreInit
: a wise use to these parameters can save at least a void cycle at initialisation time.In my experience we have only scratched the surface of what it is possible to do with shiny
. On the other hand there is a limit due to the single process nature of R
. With Shiny Server Pro
it is possible to envisage to use load balancers and spread multiple users across different servers. On the other hand to get into these territories we would need some kind of messaging system across the various instances. Already know I see the need for that in complex Shiny Server Pro
applications (e.g. when there is the need to manage different classes of users, but at the same time to communicate between them). But this is out of scope to this SO question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With