Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading multiple JSON files in a directory into one Data Frame

Tags:

r

library(rjson)
filenames <- list.files(pattern="*.json") # gives a character vector, with each file name represented by an entry

Now I want to import all the JSON files into R as one single dataFrame. How do I do that?

I first tried

myJSON <- lapply(filenames, function(x) fromJSON(file=x)) # should return a list in which each element is one of the JSON files

but the above code takes along time to terminate, since I have 15,000 files, and I know it won't return a single data frame. Is there a faster way to do this?

Sample JSON file:

 {"Reviews": [{"Ratings": {"Service": "4", "Cleanliness": "5"}, "AuthorLocation": "Boston", "Title": "\u201cExcellent Hotel & Location\u201d", "Author": "gowharr32", "ReviewID": "UR126946257", "Content": "We enjoyed the Best Western Pioneer Square....", "Date": "March 29, 2012"}, {"Ratings": {"Overall": "5"},"AuthorLocation": "Chicago",....},{...},....}]}
like image 343
Rakesh Adhikesavan Avatar asked Feb 16 '16 01:02

Rakesh Adhikesavan


2 Answers

For anyone looking for a purrr / tidyverse solution coming here:

library(purrr)
library(tidyverse)
library(jsonlite)

path <- "./your_path"
files <- dir(path, pattern = "*.json")

data <- files %>%
       map_df(~fromJSON(file.path(path, .), flatten = TRUE))
like image 84
Monduiz Avatar answered Sep 20 '22 19:09

Monduiz


Go parallel via:

library(parallel)
cl <- makeCluster(detectCores() - 1)
json_files<-list.files(path ="your/json/path",pattern="*.json",full.names = TRUE)
json_list<-parLapply(cl,json_files,function(x) rjson::fromJSON(file=x,method = "R"))
stopCluster(cl)
like image 42
amonk Avatar answered Sep 21 '22 19:09

amonk