Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using R's Plumber - create GET endpoint to host CSV formatted data rather than JSON

Tags:

json

r

csv

plumber

I think this is a good quick demo of R's plumber library in general, but mainly I'm struggling to serve data in a csv format

I am working with R's plumber package to host an API endpoint for some sports data of mine. Currently I have some data that grabs win totals for MLB baseball teams that I'm trying to serve. Using plumber, I have the following 2 scripts set up:

setupAPI.R: sets up my API with two GET endpoints:

library(plumber)
library(jsonlite)

# load in some test sports data to host
mydata = structure(list(Team = structure(c(8L, 20L, 7L, 28L, 2L, 30L, 
23L, 1L, 6L, 19L), .Label = c("Angels", "Astros", "Athletics", 
"Blue Jays", "Braves", "Brewers", "Cardinals", "Cubs", "Diamondbacks", 
"Dodgers", "Giants", "Indians", "Mariners", "Marlins", "Mets", 
"Nationals", "Orioles", "Padres", "Phillies", "Pirates", "Rangers", 
"Rays", "Red Sox", "Reds", "Rockies", "Royals", "Tigers", "Twins", 
"White Sox", "Yankees"), class = "factor"), GamesPlayed = c(162L, 
162L, 162L, 162L, 162L, 162L, 162L, 162L, 162L, 162L), CurrentWins = c(92L, 
75L, 83L, 85L, 101L, 91L, 93L, 80L, 86L, 66L)), .Names = c("Team", 
"GamesPlayed", "CurrentWins"), row.names = c(NA, 10L), class = "data.frame")

# create a GET request for shareprices (in JSON format)
#* @get /shareprices_json
getSPs <- function(){ 
  return(toJSON(mydata))
}

# create a GET request for MLB shareprices (in CSV format)
#* @get /shareprices_csv
csvSPs <- function(){
  return(mydata)
}

# run both functions (i think needed for the endpoints to work)   
getSPs()
csvSPs()

RunAPI.R: plumb's setupAPI.R, gets the endpoints hosted locally

library(plumber)
r <- plumb("setupAPI.R") 
r$run(port=8000)

. . .

After I've run the RunAPI.R code in my console, when I go to the endpoints, my http://127.0.0.1:8000/shareprices_csv endpoint is clearly returning a JSON object, and my http://127.0.0.1:8000/shareprices_json endpoint is seemingly oddly returning an JSON of length 1, with a JSON in a string as the sole element in the returned JSON.

In short, I can see now that I should simply return the dataframe, and not toJSON(the dataframe), to have the endpoint host JSON formatted data, however I still do not know how to serve this data in CSV format. Is this possible in plumber? What should the return statement look like in the functions in setupAPI.R? Any help is appreciated!!

like image 554
Canovice Avatar asked Jan 29 '23 13:01

Canovice


2 Answers

There are two tricks you need here:

  1. You can bypass serialization on an endpoint by returning the response object directly. More docs here
  2. You can specify the body of the response by mutating res$body.

You can combine these two ideas to create an endpoint like:

#' @get /data.csv
function(res) {
  con <- textConnection("val","w")
  write.csv(iris, con)
  close(con)

  res$body <- paste(val, collapse="\n")
  res
}

Note that plumber does some nice things for you for free like setting the appropriate HTTP headers for your JSON responses. If you're sending a response yourself, you're on your own for all that, so you'll need to make sure that you set the appropriate headers to teach your API clients how they should interpret this response.

like image 52
Jeff Allen Avatar answered Jan 31 '23 07:01

Jeff Allen


Just posting this answers if helps anyone!

The response from Jeff works perfectly, but becomes very slow when you have to return a big CSV file. I had problems getting stuck with a 22 MB file.

A faster solution, if you previously write the CSV on disk, is to use include_file function (docs here):

As an example:

#* @get /iris_csv
getIrisCsv <- function(req, res) {
    filename <- file.path(tempdir(), "iris.csv")
    write.csv(iris, filename, row.names = FALSE)
    include_file(filename, res, "text/csv")
}

So, it depends on your use case:

  • If you're returning a small csv and you don't want to write it to disk: use Jeff's solution
  • If your CSV is medium or big (> 2MB) or you already have it on disk: use include_file solution

Hope it helps!

like image 40
koldLight Avatar answered Jan 31 '23 08:01

koldLight