I am trying to read very huge json file using R , and I am using the RJSON library with this commend json_data <- fromJSON(paste(readLines("myfile.json"), collapse=""))
The problem is that I am getting this error message
Error in paste(readLines("myfile.json"), collapse = "") :
could not allocate memory (2383 Mb) in C function 'R_AllocStringBuffer'
Can anyone help me with this issue
Instead of reading the whole file at once, the 'chunksize' parameter will generate a reader that gets a specific number of lines to be read every single time and according to the length of your file, a certain amount of chunks will be created and pushed into memory; for example, if your file has 100.000 lines and you ...
With Gigasheet, you can open large JSON files with millions of rows or billions of cells, and work with them just as easily as you'd work with a much smaller file in Excel or Google Sheets. In one spot, the JSON data is loaded, flattened, and ready for analysis.
One of the more frequently asked questions about the native JSON data type, is what size can a JSON document be. The short answer is that the maximum size is 1GB.
Well, just sharing my experience about read json file. the progress of I am trying to read 52.8MB,19.7MB,1.3GB,93.9MB,158.5MB json files cost me 30minutes and finally auto resume R session, after that tried to apply parallel computing and would like to see the progress but failed.
https://github.com/hadley/plyr/issues/265
And then I tried to add the parameter pagesize = 10000, its work and more efficient then ever. Well, we only need read once and later save as RData/Rda/Rds format as by saveRDS.
> suppressPackageStartupMessages(library('BBmisc'))
> suppressAll(library('jsonlite'))
> suppressAll(library('plyr'))
> suppressAll(library('dplyr'))
> suppressAll(library('stringr'))
> suppressAll(library('doParallel'))
>
> registerDoParallel(cores=16)
>
> ## https://www.kaggle.com/c/yelp-recsys-2013/forums/t/4465/reading-json-files-with-r-how-to
> ## https://class.coursera.org/dsscapstone-005/forum/thread?thread_id=12
> fnames <- c('business','checkin','review','tip','user')
> jfile <- paste0(getwd(),'/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_',fnames,'.json')
> dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.parallel=TRUE)
> dat
list()
> jfile
[1] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json"
[2] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_checkin.json"
[3] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json"
[4] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_tip.json"
[5] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_user.json"
> dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.progress='=')
opening file input connection.
Imported 61184 records. Simplifying into dataframe...
closing file input connection.
opening file input connection.
Imported 45166 records. Simplifying into dataframe...
closing file input connection.
opening file input connection.
Found 470000 records...
I got the same problem while working with huge datasets in R.I had used jsonlite package in R for reading json in R.I had used the following code to read json in R:
library(jsonlite)
get_tweets <- stream_in(file("tweets.json"),pagesize = 10000)
here tweets.json is the my file name and the location where it exists,pagesize represents how many number of lines it reads in one iteration.Hope it helps.
For some reason the above solutions all caused R to terminate or worse.
This solution worked for me, with the same data set:
library(jsonlite)
file_name <- 'C:/Users/Downloads/yelp_dataset/yelp_dataset~/dataset/business.JSON'
business<-jsonlite::stream_in(textConnection(readLines(file_name, n=100000)),verbose=F)
Took about 15 minutes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With