Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Shiny - cache big dataframe

Tags:

caching

r

shiny

I'm quite new to Shiny, so my apologizes if my question is an easy one. I tried to check on google and stackoverflow but couldn't locate a simple and helpful answer so far. What's my goal/issue: I'm coding a Shiny page that displays a table with hundreds of thousands of rows. Data is sourced from different databases, manipulated, cleaned, and displayed to all the users upon request. Problem 1: in order to load all the data, the script takes almost 5minutes Problem 2: if at 8:00am user1 requests this data and at 8:05am user2 requests the same data, two different queries are launched and also two different spaces in memory are used to show exactly the same data to two different users. So the question is: shall I use a cache system to enhance this process? if not, what else shall I use? I found a lot of official Shiny documentation on caching plots but nothing related to caching data (and I found this quite surprising). Other useful information: data in cache should be deleted every evening around 10pm since new data will be available the next day / early morning.

Code:

ui <- dashboardPage(  # https://rstudio.github.io/shinydashboard/structure.html
    title = "Dashboard",  
    dashboardHeader(title = "Angelo's Board"),
    dashboardSidebar(   # inside here everything that is displayed on the left hand side
      includeCSS("www/styles.css"),    

      sidebarMenu(      

        menuItem('menu 1', tabName = "menu1", icon = icon("th"),
                 menuItem('Data 1', tabName = 'tab_data1'))

      )),


    dashboardBody( 

      tabItems(

        tabItem(tabName = 'tab_data1')),
      h3("Page with big table"),
      fluidRow(dataTableOutput("main_table"))
    ))


  server <- function(input, output, session) {

    output$main_tabl <- renderDataTable({ 
      df <- data.frame(names = c("Mark","George","Mary"), age = c(30,40,35))
    })

  }

  cat("\nLaunching   'shinyApp' ....")
  shinyApp(ui, server)

Resources I used to check for potential solution:

  • How to cache data in shiny server? but apparently I cannot use Jason Bryer package
  • https://shiny.rstudio.com/reference/shiny/1.2.0/memoryCache.html but I have no idea of how to use this code applied to my example
  • https://shiny.rstudio.com/articles/plot-caching.html is mainly focused on plot caching

Any help would be much appreciated. Thanks

like image 706
Angelo Avatar asked Nov 19 '25 12:11

Angelo


1 Answers

I would break out the bulk of your ETL processes into a separate R script and set that script to run on a cron. You can then have this script write out the processed dataframe(s) to a .feather file. Then have your shiny app load the feather file(s) - feather is optimized for reading so should be fast.

Example, take the necessary libraries and code out of your server.R (or app.R) file, and create a new R script called query.R. That script performs all the ETL operations and finally writes out your data to a .feather file (requires the feather package). Then create a crontab to run that script as often as needed.

Your server.R script then just needs to read in that feather file when the app loads and you should see a significant performance improvement. In addition, you have have the query.R script run during off hours so that performance on the linux box isn't negatively impacted.

like image 130
Alex Dometrius Avatar answered Nov 22 '25 03:11

Alex Dometrius



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!