Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert character to date *quickly* in R [duplicate]

Possible Duplicate:
Why is as.Date slow on a character vector?

I have a large data.frame (roughly 60 mil observations) that I read from a database using RMySQL. The dates are brought in as characters (there doesn't seem to be a way to change this) and so I use as.Date to convert things to date. However, this takes an extremely long time witih so many observations. Is there anything one can do to make this faster?

like image 915
Alex Avatar asked Oct 15 '12 14:10

Alex


1 Answers

Simon Urbanek's fasttime library is very fast for a subset of parseable datetimes:

R> now <- Sys.time()
R> now
[1] "2012-10-15 10:07:28.981 CDT"
R> fasttime::fastPOSIXct(format(now))
[1] "2012-10-15 05:07:28.980 CDT"
R> as.Date(fasttime::fastPOSIXct(format(now)))
[1] "2012-10-15"
R> 

However, it only parse ISO formats and assume UTC as timezone.

Edit after 3 1/2 years: Some commenters appear to think that the fasttime package is difficult to install. I beg to differ. Here is (once again) use install.r which is just a simple wrapper using littler (and also shipped as an example with):

edd@max:~$ install.r fasttime
trying URL 'https://cran.rstudio.com/src/contrib/fasttime_1.0-1.tar.gz'
Content type 'application/x-gzip' length 2646 bytes
==================================================
downloaded 2646 bytes

* installing *source* package ‘fasttime’ ...
** package ‘fasttime’ successfully unpacked and MD5 sums checked
** libs
ccache gcc -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99  -c tparse.c -o tparse.o
ccache gcc -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o fasttime.so tparse.o -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/fasttime/libs
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (fasttime)

The downloaded source packages are in
        ‘/tmp/downloaded_packages’
edd@max:~$ 

As you can see, the package has zero external dependencies, one source file and builds without the slightest hitch. We can also see that fasttime is now on CRAN which was not the case when the answer was written. With that, Windows and OS X binaries now do exist at that page and the installation will be as easy as it was for me even when you do not install from source.

like image 146
Dirk Eddelbuettel Avatar answered Oct 29 '22 11:10

Dirk Eddelbuettel