I would like to read a file with fread
from data.table
that has a column with "YYYY-MM-DD" format dates. By default, fread
reads the column as chr
. However, I would like to have the column as Date
, the same way I would obtain when applying as.Date
.
I have tried to use
dt[,starttime.date := as.Date(starttime.date)]
but it takes very long to run (I have approx. 43 million rows).
For files beyond 100 MB in size fread() and read_csv() can be expected to be around 5 times faster than read. csv() .
You can use the as. Date( ) function to convert character data to dates. The format is as. Date(x, "format"), where x is the character data and format gives the appropriate format.
The fread() function returns the number of full items successfully read, which can be less than count if an error occurs, or if the end-of-file is met before reaching count. If size or count is 0, the fread() function returns zero, and the contents of the array and the state of the stream remain unchanged.
Data. table is an extension of data. frame package in R. It is widely used for fast aggregation of large datasets, low latency add/update/remove of columns, quicker ordered joins, and a fast file reader.
Using the fasttime
package, as suggested in the fread documentation, is approximately 100x faster than as.Date
or as.IDate
:
library(data.table)
library(fasttime)
dt[,starttime.date := fastPOSIXct(starttime.date)]
Benchmark results:
library(microbenchmark)
library(fasttime)
DT <- data.table(start_date = paste(sample(1900:2018, 100000, replace = T),
sample(1:12, 100000, replace = T),
sample(1:28, 100000, replace = T),
sep = "-"))
microbenchmark(
as.Date(DT$start_date),
as.IDate(DT$start_date),
fastPOSIXct(DT$start_date)
)
> Unit: milliseconds
> expr mean
> as.Date(DT$start_date) 383.89
> as.IDate(DT$start_date) 405.89
> fastPOSIXct(DT$start_date) 4.59
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With