Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R reading a huge csv

Tags:

windows

r

csv

ram

I have a huge csv file. Its size is around 9 gb. I have 16 gb of ram. I followed the advises from the page and implemented them below.

If you get the error that R cannot allocate a vector of length x, close out of R and add the following line to the ``Target'' field:  --max-vsize=500M  

Still I am getting the error and warnings below. How should I read the file of 9 gb into my R? I have R 64 bit 3.3.1 and I am running below command in the rstudio 0.99.903. I have windows server 2012 r2 standard, 64 bit os.

> memory.limit() [1] 16383 > answer=read.csv("C:/Users/a-vs/results_20160291.csv") Error: cannot allocate vector of size 500.0 Mb In addition: There were 12 warnings (use warnings() to see them) > warnings() Warning messages: 1: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 2: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 3: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 4: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 5: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 6: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 7: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 8: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 9: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 10: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 11: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 12: In scan(file = file, what = what, sep = sep, quote = quote,  ... :   Reached total allocation of 16383Mb: see help(memory.size) 

------------------- Update1

My 1st try based upon suggested answer

> thefile=fread("C:/Users/a-vs/results_20160291.csv", header = T) Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34 Warning messages: 1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",  :   Reached total allocation of 16383Mb: see help(memory.size) 2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv",  :   Reached total allocation of 16383Mb: see help(memory.size) 

------------------- Update2

my 2nd try based upon suggested answer is as below

thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv", header=TRUE, VERBOSE=TRUE,  +                    first.rows=-1, next.rows=50000, colClasses=NA) read.table.ffdf 1.. Error: cannot allocate vector of size 125.0 Mb In addition: There were 14 warnings (use warnings() to see them) 

How could I read this file into a single object so that I can analyze the entire data in one go

------------------update 3

We bought an expensive machine. It has 10 cores and 256 gb ram. That is not the most efficient solution but it works at least in near future. I looked at below answers and I dont think they solve my problem :( I appreciate these answers. I want to perform the market basket analysis and I dont think there is no other way around rather than keeping my data in RAM

like image 665
user2543622 Avatar asked Jul 22 '16 22:07

user2543622


People also ask

How do I read a large CSV file in R?

If the CSV files are extremely large, the best way to import into R is using the fread() method from the data. table package. The output of the data will be in the form of Data table in this case.

How do I open a 20gb CSV file?

So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.

How big is too large for CSV?

Excel is limited to opening CSVs that fit within your computer's RAM. For most modern computers, that means a limit of about 60,000 to 200,000 rows.


1 Answers

Make sure you're using 64-bit R, not just 64-bit Windows, so that you can increase your RAM allocation to all 16 GB.

In addition, you can read in the file in chunks:

file_in    <- file("in.csv","r") chunk_size <- 100000 # choose the best size for you x          <- readLines(file_in, n=chunk_size) 

You can use data.table to handle reading and manipulating large files more efficiently:

require(data.table) fread("in.csv", header = T) 

If needed, you can leverage storage memory with ff:

library("ff") x <- read.csv.ffdf(file="file.csv", header=TRUE, VERBOSE=TRUE,                     first.rows=10000, next.rows=50000, colClasses=NA) 
like image 121
Hack-R Avatar answered Sep 19 '22 23:09

Hack-R