I have a huge csv file. Its size is around 9 gb. I have 16 gb of ram. I followed the advises from the page and implemented them below.
If you get the error that R cannot allocate a vector of length x, close out of R and add the following line to the ``Target'' field: --max-vsize=500M
Still I am getting the error and warnings below. How should I read the file of 9 gb into my R? I have R 64 bit 3.3.1 and I am running below command in the rstudio 0.99.903. I have windows server 2012 r2 standard, 64 bit os.
> memory.limit() [1] 16383 > answer=read.csv("C:/Users/a-vs/results_20160291.csv") Error: cannot allocate vector of size 500.0 Mb In addition: There were 12 warnings (use warnings() to see them) > warnings() Warning messages: 1: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 2: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 3: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 4: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 5: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 6: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 7: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 8: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 9: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 10: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 11: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size) 12: In scan(file = file, what = what, sep = sep, quote = quote, ... : Reached total allocation of 16383Mb: see help(memory.size)
My 1st try based upon suggested answer
> thefile=fread("C:/Users/a-vs/results_20160291.csv", header = T) Read 44099243 rows and 36 (of 36) columns from 9.399 GB file in 00:13:34 Warning messages: 1: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv", : Reached total allocation of 16383Mb: see help(memory.size) 2: In fread("C:/Users/a-vsingh/results_tendo_20160201_20160215.csv", : Reached total allocation of 16383Mb: see help(memory.size)
my 2nd try based upon suggested answer is as below
thefile2 <- read.csv.ffdf(file="C:/Users/a-vs/results_20160291.csv", header=TRUE, VERBOSE=TRUE, + first.rows=-1, next.rows=50000, colClasses=NA) read.table.ffdf 1.. Error: cannot allocate vector of size 125.0 Mb In addition: There were 14 warnings (use warnings() to see them)
How could I read this file into a single object so that I can analyze the entire data in one go
We bought an expensive machine. It has 10 cores and 256 gb ram. That is not the most efficient solution but it works at least in near future. I looked at below answers and I dont think they solve my problem :( I appreciate these answers. I want to perform the market basket analysis and I dont think there is no other way around rather than keeping my data in RAM
If the CSV files are extremely large, the best way to import into R is using the fread() method from the data. table package. The output of the data will be in the form of Data table in this case.
So, how do you open large CSV files in Excel? Essentially, there are two options: Split the CSV file into multiple smaller files that do fit within the 1,048,576 row limit; or, Find an Excel add-in that supports CSV files with a higher number of rows.
Excel is limited to opening CSVs that fit within your computer's RAM. For most modern computers, that means a limit of about 60,000 to 200,000 rows.
Make sure you're using 64-bit R, not just 64-bit Windows, so that you can increase your RAM allocation to all 16 GB.
In addition, you can read in the file in chunks:
file_in <- file("in.csv","r") chunk_size <- 100000 # choose the best size for you x <- readLines(file_in, n=chunk_size)
You can use data.table
to handle reading and manipulating large files more efficiently:
require(data.table) fread("in.csv", header = T)
If needed, you can leverage storage memory with ff
:
library("ff") x <- read.csv.ffdf(file="file.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000, colClasses=NA)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With