I have a csv which includes about 2 million rows of date strings in the format:
2012/11/13 21:10:00
Lets call that csv$Date.and.Time
I want to convert these dates (and their accompanying data) to xts as fast as possible
I have written a script which performs the conversion just fine (see below), but it's terribly slow and I'd like to speed this up as much as possible.
Here is my current methodology. Does anyone have any suggestions on how to make this faster?
dt <- as.POSIXct(csv$Date.and.Time,tz="UTC")
idx <- format(dt,tz=z,usetz=TRUE)
So the script converts these date strings to POSIX.ct
. It then does a timezone conversion using format
(z
is a variable representing the TZ to which I am converting). I then do a regular xts
call to make this an xts series with the rest of the data in the csv.
This works 100%. It's just very, very slow. I've tried running this in parallel (it doesn't do anything; if anything it makes it worse). What do I mean by 'slow'?
user system elapsed
155.246 16.430 171.650
That's on a 3GhZ, 16GB ram 2012 mb pro. I can get about half that on a similar processor with 32GB RAM on a Win7 Machine
I'm sure someone has a better idea - I'm open to suggestions via Rcpp
etc. However, ideally the solution works with the csv rather than some other method, like setting up a database. Having said that, I'm up to doing this via whatever method is going to give the fastest conversion.
I'd be super appreciative of any help at all. Thanks in advance.
You want the small and simple fasttime package by Simon which does this in the fastest possible way---by not calling time parsing functions but just using C-level string functions.
It does not support as many formats as strptime
. In fact, it doesn't even have a format string. But well-formed ISO format variants, that is yyyy-mm-dd hh:mm:ss.fff
will work, and your /
separator may just work too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With