Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Shiny in Memory Application or noSQL

I am running a tiny web app using R's shiny framework. The tool doesn't do that much. It's just filtering data frames with given parameters from the UI. The problem I have now is the following. If a user is accessing the app via http it takes a long time to start the App. Since the data, which I load in the global.R, is pretty big (~5GB). After an initial start, the App is running smooth, also when re-accessing within a given time (the app seems to be completely in memory, for some minutes). Since I got enough memory available, and my data doesn't change by user interaction, I am asking myself if I could keep the complete App in memory. Is it possible to force this? My server is running centOS 6. Also the problem isn't the file system, hard disk, etc. - I created a ram disk to load the data from, but the performance increase is marginal. So the bottle neck seems to be R, when processing the data.

Now I got two ideas, which may overcome the problem.

  • Just as I mentioned, is it possible to keep the complete app in memory?
  • Don't save the Data as R objects, instead use a fast noSQL DB e.g. Redis wich is in memory

May one of you has some experience when loading bigger data. I would be thankful if could get a discussion going. If it is possible, I would like to avoid external software, like Redis, to keep everything as simple as possible.

With all the best,

Mario

like image 486
mariodeng Avatar asked Aug 27 '14 11:08

mariodeng


People also ask

What is R Shiny used for?

What is Shiny (R)? Shiny is an R package that enables building interactive web applications that can execute R code on the backend. With Shiny, you can host standalone applications on a webpage, embed interactive charts in R Markdown documents, or build dashboards.

Do you need R to run Shiny app?

browser=TRUE will open the app in a browser. By default, shiny runs with the option "window" inside R-Studio, which without R-Studio won't work.

How hard is it to learn R Shiny?

Along with Shiny elements, you can use HTML elements to stylize your content in your application. In my opinion, R Shiny is very easy to learn despite how powerful the tool is. If you're working on a side project or looking to add something to your portfolio, I highly recommend trying it out.


2 Answers

I have no experience with noSQL databases, but here is how I am combining shiny with an Oracle database to speed up my apps:

User inputs are passed to an sql query which is sent to the extremly fast database, and only the output of this query is read into R. In many cases (especially if the sql involves a group by statement) this reduces the number of observations to be read from several millions to a few hundreds. Hence, data loading becomes very fast.

In the example below users first select questionnaires and the date range. This generate an sql statement which filters the relevant observations and counts frequencies of answers per question and questionnaire. These frequencies are read into R and displayed as a datatable in the shiny app.

library(shiny)
library(ROracle)
library(DT)

drv <- dbDriver("Oracle")
con <-dbConnect(drv, username = "...", password = '...', dbname = "...")

query <- 'select distinct questionnaire from ... order by questionnaire'
questionnaire.list <- dbGetQuery(con, query)$questionnaire


ui <- fluidPage(
      selectInput('questionnaire_inp','Questionnaire',
                  choices=questionnaire.list,selected=questionnaire.list,multiple=T),
      dateRangeInput("daterange_inp", "Date range", 
                     start='2016-01-01', end=Sys.Date()),
      dataTableOutput('tbl')
)

server <- function(input, output) {

  output$tbl <- renderDataTable({
    query <- paste0(
        "select questionnaire, question, answer, count(*) from ... 
        where title in (", paste0(shQuote(input$questionnaire_inp), collapse=","), ")
        and date between to_date('", input$daterange_inp[1] ,"','YYYY-MM-DD') 
        and to_date ('", input$daterange_inp[1] ,"','YYYY-MM-DD') 
        group by questionnaire, question, answer")
    dt <- dbGetQuery(con, query)
    datatable(dt)

  })

shinyApp(ui = ui, server = server)
like image 77
Till Avatar answered Sep 25 '22 22:09

Till


You can set the timeout to be a longer value. I'm not sure if infinite value(or long enough value) is possible.

Other ways not involving a database may be:

  1. use data.table fread if you read from csv. This can be several times faster than read.csv. Specify the column classes can further improve speed.

  2. Or use binary format .RDS which should be fast and smaller in size thus quicker to read.

If you are using .RDS .Rdata already there is not much to do in this aspect.

like image 37
dracodoc Avatar answered Sep 24 '22 22:09

dracodoc