Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?) [duplicate]

I have some very big delimited data files and I want to process only certain columns in R without taking the time and memory to create a data.frame for the whole file.

The only options I know of are read.table which is very wasteful when I only want a couple of columns or scan which seems too low level for what I want.

Is there a better option, either with pure R or perhaps calling out to some other shell script to do the column extraction and then using scan or read.table on it's output? (Which leads to the question how to call a shell script and capture its output in R?).

like image 293
Alex Stoddard Avatar asked Feb 03 '10 17:02

Alex Stoddard


People also ask

How do I make some columns read only in R?

Method 1: Using read. table() function. In this method of only importing the selected columns of the CSV file data, the user needs to call the read. table() function, which is an in-built function of R programming language, and then passes the selected column in its arguments to import particular columns from the data.

How do I select all columns in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.


1 Answers

Sometimes I do something like this when I have the data in a tab-delimited file:

df <- read.table(pipe("cut -f1,5,28 myFile.txt")) 

That lets cut do the data selection, which it can do without using much memory at all.

See Only read limited number of columns for pure R version, using "NULL" in the colClasses argument to read.table.

like image 166
Ken Williams Avatar answered Sep 18 '22 08:09

Ken Williams