reading in large text files in r

Question

I want to read in a large ido file that had just under 110,000,000 rows and 8 columns. The columns are made up of 2 integer columns and 6 logical columns. The delimiter "|" is used in the file. I tried using read.big.matrix and it took forever. I also tried dumpDf and it ran out of RAM. I tried ff which I heard was a good package and I am struggling with errors. I would like to do some analysis with this table if I can read it in some way. If anyone has any suggestions that would be great. Kind Regards, Lorcan

Lorcan Treanor · Accepted Answer

Thank you for all your suggestions. I managed to figure out why I couldn't get the error to work. I'll give you all answers and suggestions so no one can make my stupid mistake again.

First of all, the data that was been giving to me contained some errors in it so I was doomed to fail from the start. I was unaware until a colleague came across it in another piece of software. In a column that contained integers there were some letters so that when the read.table.ff package tried to read in the data set it somehow got confused or I don't know. Whatever though I was given another sample of data, 16,000,000 rows and 8 columns with correct entries and it worked perfectly. The code that I ran is as follows and took about 30 seconds to read:

setwd("D:/data test")
library(ff)
ffdf1 <- read.table.ffdf(file = "test.ido", header = TRUE,  sep = "|")

Thank you all for your time and if you have any questions about the answer feel free to ask and I will do my best to help.

Paul Hiemstra · Answer

Do you really need all the data for your analysis? Maybe you could aggregate your dataset (say from minute values to daily averages). This aggregation only needs to be done once, and can hopefully be done in chunks. In this way you do need to load all your data into memory at once.

Reading in chunks can be done using scan, the important arguments are skip and n. Alternatively, put your data into a database and extract the chunks in that way. You could even using the functions from the plyr package to run chunks in parallel, see this blog post of mine for an example.

reading in large text files in r

Tags:

r

Lorcan Treanor

2 Answers

Lorcan Treanor

Paul Hiemstra

Recent Activity

Donate For Us

reading in large text files in r

Tags:

r

Lorcan Treanor

2 Answers

Lorcan Treanor

Paul Hiemstra

Related questions

Recent Activity

Donate For Us