I would like to be able to skip a column that is read into R via data.table
's fread
function in v1.8.9. But the csv I am reading in, has no column headers…which appears to be a problem for fread... is there a way to just specify that I don't want specific columns?
Would it be better to just pre-allocate a column name and then let it read it in so that it can be skipped?
To give an example, I downloaded the data from the following URL
http://www.truefx.com/dev/data/2013/MAY-2013/AUDUSD-2013-05.zip
unzipped it…
and read the csv into R using fread and it has pretty much the same file name just with the csv extension.
system.time(pp <- fread("AUDUSD-2013-05.csv",sep=","))
user system elapsed
16.427 0.257 16.682
head(pp)
V1 V2 V3 V4
1: AUD/USD 20130501 00:00:04.728 1.03693 1.03721
2: AUD/USD 20130501 00:00:21.540 1.03695 1.03721
3: AUD/USD 20130501 00:00:33.789 1.03694 1.03721
4: AUD/USD 20130501 00:00:37.499 1.03692 1.03724
5: AUD/USD 20130501 00:00:37.524 1.03697 1.03719
6: AUD/USD 20130501 00:00:39.789 1.03697 1.03717
str(pp)
Classes ‘data.table’ and 'data.frame': 4060762 obs. of 4 variables:
$ V1: chr "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
$ V2: chr "20130501 00:00:04.728" "20130501 00:00:21.540" "20130501 00:00:33.789" "20130501 00:00:37.499" ...
$ V3: num 1.04 1.04 1.04 1.04 1.04 ...
$ V4: num 1.04 1.04 1.04 1.04 1.04 ...
- attr(*, ".internal.selfref")=<externalptr>
I tried using the new(ish) colClasses or skip arguments to ignore the fact that the first column is all the same…and is unnecessary.
but doing:
pp1 <- fread("AUDUSD-2013-05.csv",sep=",",skip=1)
doesn't omit the reading in of the first column
and using colClasses leads to the following error
pp1 <- fread("AUDUSD-2013-05.csv",sep=",",colClasses=list(NULL,"character","numeric","numeric"))
Error in fread("AUDUSD-2013-05.csv", sep = ",", colClasses = list(NULL, :
colClasses is type list but has no names
other attempts incude
pp1 <- fread("AUDUSD-2013-06.csv",sep=",", colClasses=c(V1=NULL,V2="character",V3="numeric",V4="numeric"))
str(pp1)
Classes ‘data.table’ and 'data.frame': 5524877 obs. of 4 variables:
$ V1: chr "AUD/USD" "AUD/USD" "AUD/USD" "AUD/USD" ...
$ V2: chr "20130603 00:00:00.290" "20130603 00:00:00.291" "20130603 00:00:00.292" "20130603 00:00:03.014" ...
$ V3: num 0.962 0.962 0.962 0.962 0.962 ...
$ V4: num 0.962 0.962 0.962 0.962 0.962 ...
- attr(*, ".internal.selfref")=<externalptr>
i.e pretty much exactly the same as if I had not used colClasses...
Are there any suggestions to be able to speed up the reading in of data by omitting the first column?
Also perhaps a bit much to ask, but is it possible to directly read a zip file rather than unzipping it first and then reading in the csv?
Oh and if it wasn't clear I'm using data.table v1.8.9
Not only was fread() almost 2.5 times faster than readr's functionality in reading and binding the data, but perhaps even more importantly, the maximum used memory was only 15.25 GB, compared to readr's 27 GB. Interestingly, even though very slow, base R also spent less memory than the tidyverse suite.
The benefit of reading the csv file with fread function is that there will be a variable added to the original csv file which contains the id as integers starting from 1 to the length of column values.
Its fread() function is meant to import data from regular delimited files directly into R, without any detours or nonsense. Note that “regular” in this case means that every row of your data needs to have the same number of columns.
I think the argument you're looking for is drop
. Try:
require(data.table) # 1.9.2+
pp <- fread("AUDUSD-2013-05.csv", drop = 1)
Note that you can drop
by name or position.
fread("AUDUSD-2013-05.csv", drop = c("columThree","anotherColumnName"))
fread("AUDUSD-2013-05.csv", drop = 10:15) # read all columns other than 10:15
And you can select
by name or position, too.
fread("AUDUSD-2013-05.csv", select = 10:15) # read only columns 10:15
fread("AUDUSD-2013-05.csv", select = c("columnA","columnName2"))
These arguments were added to v1.9.2 (released to CRAN in Feb 2014) and are documented in ?fread
. You'll need to upgrade to use them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With