Is there a way for fread
to mimic the behaviour of read.table
whereby the class
of the variable is set by the data that is read in.
I have numeric data with a few comments underneath the main data. When i use fread
to read in the data, the columns are converted to character. However, by setting the nrow
in read.table` i can stop this behaviour. Is this possible in fread. (I would prefer not to alter the raw data or make an amended copy). Thanks
An example
d <- data.frame(x=c(1:100, NA, NA, "fff"), y=c(1:100, NA,NA,NA))
write.csv(d, "test.csv", row.names=F)
in_d <- read.csv("test.csv", nrow=100, header=T)
in_dt <- data.table::fread("test.csv", nrow=100)
Which produces
> str(in_d)
'data.frame': 100 obs. of 2 variables:
$ x: int 1 2 3 4 5 6 7 8 9 10 ...
$ y: int 1 2 3 4 5 6 7 8 9 10 ...
> str(in_dt)
Classes ‘data.table’ and 'data.frame': 100 obs. of 2 variables:
$ x: chr "1" "2" "3" "4" ...
$ y: int 1 2 3 4 5 6 7 8 9 10 ...
- attr(*, ".internal.selfref")=<externalptr>
As a workaround I thought i would be able to use read.table
to read in one line, get the class and set the colClasses
, but i am misunderstanding.
cl <- read.csv("test.csv", nrow=1, header=T)
cols <- unname(sapply(cl, class))
in_dt <- data.table::fread("test.csv", nrow=100, colClasses=cols)
str(in_dt)
Using Windows8.1 R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit)
Option 1: Using a system command
fread()
allows the use of a system command in its first argument. We can use it to remove the quotes in the first column of the file.
indt <- data.table::fread("cat test.csv | tr -d '\"'", nrows = 100)
str(indt)
# Classes ‘data.table’ and 'data.frame': 100 obs. of 2 variables:
# $ x: int 1 2 3 4 5 6 7 8 9 10 ...
# $ y: int 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, ".internal.selfref")=<externalptr>
The system command cat test.csv | tr -d '\"'
explained:
cat test.csv
reads the file to standard output|
is a pipe, using the output of the previous command as input for the next command tr -d '\"'
deletes (-d
) all occurrences of double quotes ('\"'
) from the current inputOption 2: Coercion after reading
Since option 1 doesn't seem to be working on your system, another possibility is to read the file as you did, but convert the x
column with type.convert()
.
library(data.table)
indt2 <- fread("test.csv", nrows = 100)[, x := type.convert(x)]
str(indt2)
# Classes ‘data.table’ and 'data.frame': 100 obs. of 2 variables:
# $ x: int 1 2 3 4 5 6 7 8 9 10 ...
# $ y: int 1 2 3 4 5 6 7 8 9 10 ...
# - attr(*, ".internal.selfref")=<externalptr>
Side note: I usually prefer to use type.convert()
over as.numeric()
to avoid the "NAs introduced by coercion" warning triggered in some cases. For example,
x <- c("1", "4", "NA", "6")
as.numeric(x)
# [1] 1 4 NA 6
# Warning message:
# NAs introduced by coercion
type.convert(x)
# [1] 1 4 NA 6
But of course you can use as.numeric()
as well.
Note: This answer assumes data.table dev v1.9.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With