I have some very big delimited data files and I want to process only certain columns in R without taking the time and memory to create a data.frame
for the whole file.
The only options I know of are read.table
which is very wasteful when I only want a couple of columns or scan
which seems too low level for what I want.
Is there a better option, either with pure R or perhaps calling out to some other shell script to do the column extraction and then using scan or read.table on it's output? (Which leads to the question how to call a shell script and capture its output in R?).
Method 1: Using read. table() function. In this method of only importing the selected columns of the CSV file data, the user needs to call the read. table() function, which is an in-built function of R programming language, and then passes the selected column in its arguments to import particular columns from the data.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
Sometimes I do something like this when I have the data in a tab-delimited file:
df <- read.table(pipe("cut -f1,5,28 myFile.txt"))
That lets cut
do the data selection, which it can do without using much memory at all.
See Only read limited number of columns for pure R version, using "NULL"
in the colClasses
argument to read.table
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With