Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does "23 Dogs" get parsed to 23 november 2015 in pry, but "3 Dogs" gives a parser error?

I found the below code snippet on Twitter (check post history for source).

[5] pry(main)> Date.parse('3 Dogs')
ArgumentError: invalid date
[6] pry(main)> Date.parse('23 Dogs')
=> Mon, 23 Nov 2015

Is this just an easter egg in pry? If so, why this particular date and result? If it's not an easter egg, why does 23 Dogs parse to a date, but 3 Dogsdoesn't parse?

like image 431
Nzall Avatar asked Nov 23 '15 20:11

Nzall


1 Answers

So it's got nothing to do with pry. I can reproduce your report in ruby 2.2.2 in ruby code that does not load pry at all.

So why the heck is Date.parse willing to parse "23 dogs" and come up with something? I have no idea. I'd say it's some idiosyncracy or even bug in Date's parsing; it attempts to parse all manner of things, but this leads to some odd edge cases.

For more predictable parsing of dates in known fixed formats, use Date#strptime instead. For more sophisticated parsing of natural language dates in unpredictable formats, use the chronic gem.

Personally, I never use straight Date.parse, because it's kind of unpredictable, using one of those two methods instead. (Or specific format parsing methods like Date.iso8601).

I tried to look at the MRI code for Date.parse because I was curious if I could figure out what it was doing. But quickly got lost in C code I wasn't competent to understand or follow, and had to give up.

Interestingly, this DOES reproduce in JRuby 1.7.10 too (I haven't installed jruby 9x yet). "23 dogs" parses to the same thing, "3 dogs" raises. Hm, perhaps the JRuby Java code will be more intelligible to some of us than MRI's C code. But I haven't had time to try to work through/debug just what Date#parse in JRuby is doing. The meat of it maybe begins here, although I might not have found the correct place for current version implementation. You can see that it tries parsing the date according to a number of different formats in sequence, stopping when it succesfully parses according to some format. We can guess there's SOME weird format in that list that somehow succesfuly parses "23 dogs" but not "3 dogs". It's probalby not an easter egg or intentional at all; it's just a weird side effect of trying to parse a date by just trying to guess what format it is in and trying various formats in sequence, not a very sophisticated algorithm.

update Okay, in at least the jruby code I was looking at (which might not be the current implementation, but is some implementation)

  • Eventually, after trying other potential parses which fail, it tries Date._parse_ddd -- for both inputs.

  • Date._parse_ddd("23 dogs", e) returns true, and fills the Date::Parse::Bag with an mday component, but Date._parse_ddd("3 dogs", e) returns false and does not fill the Bag. So everything else follows from here.

  • If we look at the Date._parse_ddd implementation... there's some monster regexes and weird logic. Probably copied from MRI to be consistent with MRI, or otherwise made consistent with MRI behavior.

  • I don't feel like debugging further. You can if you want. The JRuby implementation, as you can see, is actually written in ruby, not even in Java.

You or I or someone could try to debug further (perhaps even with an interactive debugger on JRuby stdlib implementation) to figure out exactly what's going on. But I'm confident the answer is just basically "it's a weird side effect of Date.parse not really knowing what format it's input is in, but just trying a bunch of things, using a not very sophisticated algorithm, sometimes weird things happen"

more update: Note that Date.parse("03 dogs") does parse instead of raising. So two numerals it decides are parseable, one it does not. But of course Date.parse("3 May") works fine. It's not that Date.parse requires two digit dates, it's just that it's trying a whole bunch of different ways of parsing it, and an actual good date will be caught correctly, but a bad date might be caught by one of the ways that thought it seemed good enough, but in this case was wrong.

more thoughts So it's not intentional that it parses like that. It's a byproduct of heuristic rules meant to catch other dates. Since the code isn't commented we can't say exactly what sorts of dates what parts were meant to catch. It's sort of a bunch of cobbled together stuff to try and catch dates in a variety of formats, including international formats.

You could look at the tests to see all the kinds of dates it's meant to catch. Or you could try to go through the code to understand exactly which lines result in the behavior you are seeing. The code is confusing -- especially the C code in MRI, to most of us. The pure ruby code in JRuby is more readable of course to we rubyists. Since it's confusing and time-consuming to go through the code, with little benefit (who cares?), you probably aren't going to get anyone else to do this for you.

like image 196
jrochkind Avatar answered Sep 29 '22 08:09

jrochkind