How to load big csv file with mixed-type columns using the bigmemory package

Question

Is there a way to combine the use of scan() and read.big.matrix() from the bigmemory package to read in a 200 MB .csv file with mixed-type columns so that the result is a dataframe with integer, character, and numeric columns?

mdsumner · Accepted Answer

Try the ff package for this.

library(ff)
help(read.table.ffdf)

Function ‘read.table.ffdf’ reads separated flat files into ‘ffdf’ objects, very much like (and using) ‘read.table’. It can also work with any convenience wrappers like ‘read.csv’ and provides its own convenience wrapper (e.g. ‘read.csv.ffdf’) for R's usual wrappers.

For 200Mb it should be as simple a task as this.

 x <- read.csv.ffdf(file=csvfile)

(For much bigger files it will likely require that you investigate some of the configuration options, depending on your machine and OS).

Iterator · Answer

Ah, there are some things that are impossible in this life, and there are some that are misunderstood and lead to unpleasant situations. @Roman is right: a matrix must be of one atomic type. It's not a dataframe.

Since a matrix must be of one type, attempting to snooker bigmemory to handle multiple types is, in itself, a bad thing. Could it be done? I'm not going there. Why? Because everything else will assume that it's getting a matrix, not a dataframe. That will lead to more questions and more sorrow.

Now, what you can do is to identify the types of each of the columns, and generate a set of distinct bigmemory files, each containing the items that are of a particular type. E.g. charBM = character big matrix, intBM = integer big matrix, and so on. Then, you may be able to develop have a wrapper that produces a data frame out of all of this. Still I don't recommend that: treat the different items as what they are, or coerce homogeneity if you can, rather than try to produce a big dataframe griffin.

@mdsumner is correct in suggesting ff. Another storage option is HDF5, which you can access through ncdf4 in R. Unfortunately, these other packages are not as pleasant as bigmemory.

How to load big csv file with mixed-type columns using the bigmemory package

Tags:

dataframe

r

csv

large-files

import-from-csv

Lourdes

2 Answers

mdsumner

Iterator

Recent Activity

Donate For Us

How to load big csv file with mixed-type columns using the bigmemory package

Tags:

dataframe

r

csv

large-files

import-from-csv

Lourdes

2 Answers

mdsumner

Iterator

Related questions

Recent Activity

Donate For Us