Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to create a date from a string in Racket - find-seconds VERY slow, week-day year-day required?

I'm trying to parse dates from a large csv file in Racket.

The most straightforward way to do this would be to create a new date struct. But it requires the week-day and year-day parameters. Of course I don't have these, and this seems like a real weakness of the date module that I don't understand.

So, as an alternative, I decided to use find-seconds to convert the raw date vals into seconds and then pass that to seconds->date. This works, but is brutally slow.

(time
 (let loop ([n 10000])
   (apply find-seconds '(0 0 12 1 1 2012)) ; this takes 3 seconds for 10000
   ;(date 0 0 12 1 1 2012 0 0 #f 0) ; this is instant
   (if (zero? n)
       'done
       (loop (sub1 n)))))

find-seconds takes 3 seconds to do 10000 values, and I have several million. Creating the date struct is of course instant, but I don't have the week-day, year-day values.

My questions are:

1.) Why is week-day/year-day required for creating date structs?

2.) Is find-seconds supposed to be this slow (ie, bug)? Or am I doing something wrong?

3.) Are there any alternatives to parse dates in a fast manner. I know srfi/19 has a string->date function, but I'd then have to change everything to use that module's struct instead of racket's built-in one. And it may suffer the same performance hit of find-seconds, I'm not sure.

like image 583
Scott Klarenbach Avatar asked Aug 19 '12 02:08

Scott Klarenbach


1 Answers

Although not documented as such, it appears that week-day and year-day are "no-ops" when using the date struct with date->seconds. If I set them both to 0, a date->seconds doesn't complain. I suspect it ignores them:

#lang racket

(require racket/date)

(define d (date 1    ;sc
                2    ;mn
                3    ;hr
                20   ;day
                8    ;month
                2012 ;year
                0    ;weekday  <<<
                0    ;year-day <<<
                #f   ;dst?
                0    ;time-zone-offset
                ))

(displayln (seconds->date (date->seconds d)))

;; =>
#(struct:date* 1 2 3 20 8 2012 1 232 #t -14400 0 EDT)
                               ^ ^^^

My guess is that the date struct was defined for use with seconds->date, where week-day and year-day would be interesting information to provide. Then for date->seconds, rather than define another struct with those fields missing (they're "redundant" for determining the date, which is why you're understandably annoyed :)) for use with date->seconds, the same struct was reused.

Does that help? It's not clear to me from your question what you're trying to do with the date information from the CSV. If you want to convert it to an integer seconds value, I think the above should work for you. If you have something else in mind, perhaps you could explain.

like image 55
Greg Hendershott Avatar answered Sep 23 '22 12:09

Greg Hendershott