Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fread (data.table) select columns, throw error if column not found

Tags:

r

data.table

I'm loading a csvfile into R with data.table's fread function. It has a bunch of columns that I don't need, so the select parameter comes in handy. I've noticed, however, that if one of the columns specified in the select does not exist in the csvfile, fread will silently continue. Is it possible to make R throw an error if one of the selected columns doesn't exist in the csvfile?

#csvfile has "col1" "col2" "col3" "col4" etc

colsToKeep <- c("col1", "col2" "missing")

data <- fread(csvfile, header=TRUE, select=colsToKeep, verbose=TRUE)

In the above example, data will have two columns: col1, col2. The remaining columns will be dropped as expected, but missing is silently skipped. It would certainly be nice to know that fread is skipping that column because it did not find it.

like image 828
stephentgrammer Avatar asked Oct 29 '14 22:10

stephentgrammer


Video Answer


1 Answers

I'd suggest parsing the first row pre-emptively, then throwing your own error. You could do:

read_cols <- function(file_name, colsToKeep) {
    header <- fread(file_name, nrows = 1, header = FALSE)
    all_in_header <- all(colsToKeep %chin% unlist(header))
    stopifnot(all_in_header)

    fread(file_name, header=TRUE, select=colsToKeep, verbose=TRUE)
}

my_data <- read_cols(csvfile, c("col1", "col2" "missing"))
like image 147
shadowtalker Avatar answered Sep 23 '22 07:09

shadowtalker