Read CSV in R and filter columns by name

Question

Let's say I have a CSV with dozens or hundreds of columns and I want to pull in just about 2 or 3 columns. I know about the colClasses solution as described here but the code gets very unreadable.

I want something like usecols from pandas' read_csv.

Loading everything and just selecting afterwards is not a solution (the file is super big, it doesn't fit in memory).

Rui Barradas · Accepted Answer

One way is to use package sqldf. If you know SQL, it is possible to read in large files filtering only the parts you want.

I will use built-in dataset iris to make the example reproducible, saving it to disk first.

write.csv(iris, "iris.csv", row.names = FALSE)

Now the problem.
Argument row.names is like in the write.csv instruction.
Note the backticks around Sepal.Length. This is due to the dot character in the column name.

library(sqldf)

sql <- "select `Sepal.Length`, Species from file"
sub_iris <- read.csv.sql("iris.csv", sql = sql, row.names = FALSE)

head(sub_iris)
#  Sepal.Length  Species
#1          5.1 "setosa"
#2          4.9 "setosa"
#3          4.7 "setosa"
#4          4.6 "setosa"
#5          5.0 "setosa"
#6          5.4 "setosa"

And final clean up.

unlink("iris.csv")

LocoGris · Answer

I will use package data.table and then with fread() specify columns to keep/drop by arguments selector drop. From ?fread

select Vector of column names or numbers to keep, drop the rest.

drop Vector of column names or numbers to drop, keep the rest.

Best!

Read CSV in R and filter columns by name

Tags:

r

csv

readr

gsmafra

2 Answers

Rui Barradas

LocoGris

Recent Activity

Donate For Us

Read CSV in R and filter columns by name

Tags:

r

csv

readr

gsmafra

2 Answers

Rui Barradas

LocoGris

Related questions

Recent Activity

Donate For Us