Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there any workaround for broken v8 date parser?

Tags:

V8 Date parser is broken:

> new Date('asd qw 101') Sat Jan 01 101 00:00:00 GMT+0100 (CET) 

I can use fragile regular expression like this:

\d{1,2} (jan|feb|mar|may|jun|jul|aug|sep|oct|nov|dec) \d{1,4} 

but it is too fragile. I cannot rely on new Date (issue in V8) and also moment cant help me because moment is getting rid off date detection (github issue-thread).

is there any workaround for broken v8 date parser?

To be clear. We have Gecko and V8, both have Date. V8 has broken Date, Gecko has working one. I need the Date from in Gecko (Firefox).

Update: It’s definitely broken parser https://code.google.com/p/v8/issues/detail?id=2602
nope, Status: WorkingAsIntended

like image 476
Vladimir Starkov Avatar asked Jun 21 '15 13:06

Vladimir Starkov


People also ask

How do you check the string is date or not?

One way to check if a string is date string with JavaScript is to use the Date. parse method. Date. parse returns a timestamp in milliseconds if the string is a valid date.

What is date parser?

Date.parse() The Date.parse() method parses a string representation of a date, and returns the number of milliseconds since January 1, 1970, 00:00:00 UTC or NaN if the string is unrecognized or, in some cases, contains illegal date values (e.g. 2015-02-31).


2 Answers

Date objects are based on a time value that is the number of milliseconds since 1 January, 1970 UTC and have the following constructors

new Date(); new Date(value); new Date(dateString); new Date(year, month[, day[, hour[, minutes[, seconds[, milliseconds]]]]]); 

From the docs,

dateString in new Date(dateString) is a string value representing a date. The string should be in a format recognized by the Date.parse() method (IETF-compliant RFC 2822 timestamps and also a version of ISO8601).

Now looking at the v8 sourcecode in date.js:

function DateConstructor(year, month, date, hours, minutes, seconds, ms) {   if (!%_IsConstructCall()) {     // ECMA 262 - 15.9.2     return (new $Date()).toString();   }    // ECMA 262 - 15.9.3   var argc = %_ArgumentsLength();   var value;   if (argc == 0) {     value = %DateCurrentTime();     SET_UTC_DATE_VALUE(this, value);   } else if (argc == 1) {     if (IS_NUMBER(year)) {       value = year;     } else if (IS_STRING(year)) {       // Probe the Date cache. If we already have a time value for the       // given time, we re-use that instead of parsing the string again.       var cache = Date_cache;       if (cache.string === year) {         value = cache.time;       } else {         value = DateParse(year);               <- DOES NOT RETURN NaN         if (!NUMBER_IS_NAN(value)) {           cache.time = value;           cache.string = year;         }       }      } ... 

it looks like DateParse() does not return a NaN for for a string like 'asd qw 101' and hence the error. You can cross-check the same with Date.parse('asd qw 101') in both Chrome(v8) [which returns -58979943000000] and Gecko (Firefox) [which returns a NaN]. Sat Jan 01 101 00:00:00 comes when you seed new Date() with a timestamp of -58979943000000(in both browsers)

is there any workaround for broken v8 date parser?

I wouldnt say V8 date parser is broken. It just tries to satisfy a string against RFC 2822 standard in the best possible way but so does gecko and both break gives different results in certain cases.

Try new Date('Sun Ma 10 2015') in both Chrome(V8) and Firefox(Gecko) for another such anomaly. Here chrome cannot decide weather 'Ma' stands for 'March' or 'May' and gives an Invalid Date while Firefox doesnt.

Workaround:

You can create your own wrapper around Date() to filter those strings that V8's own parser cannot. However, subclassing built-ins in ECMA-5 is not feasible. In ECMA-6, it will be possible to subclass built-in constructors (Array, Date, and Error) - reference

However you can use a more robust regular expression to validate strings against RFC 2822/ISO 8601

^(?:(?:31(\/|-|\. |\s)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.|\s)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.|\s)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.|\s)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$ 

Image-regex Image generated from debuggex

So, seems like v8 aint broken, it just works differently.

Hope it helps!

like image 121
nalinc Avatar answered Oct 21 '22 01:10

nalinc


You seem to be asking for a way to parse a string that might be in any particular format and determine what data is represented. There are many reasons why this is a bad idea in general.

You say moment.js is "getting rid of date detection", but actually it never had this feature in the first place. People just made the assumption that it could do that, and in some cases it worked, and in many cases it didn't.

Here's an example that illustrates the problem.

 var s = "01.02.03"; 

Is that a date? Maybe. Maybe not. It could be a section heading in a document. Even if we said it was a date, what date is it? It could be interpreted as any of the following:

  • January 2nd, 2003
  • January 2nd, 0003
  • February 1st, 2003
  • February 1st, 0003
  • February 3rd, 2001
  • February 3rd, 0001

The only way to disambiguate would be with knowledge of the current culture date settings. Javascript's Date object does just that - which means you will get a different value depending on the settings of the machine where the code is running. However, moment.js is about stability across all environments. Cultural settings are explicit, via moment's own locale functionality. Relying on the browser's culture settings leads to errors in interpretation.

The best thing to do is to be explicit about the format you are working with. Don't allow random garbage input. Expect your input in a particular format, and use a regex to validate that format ahead of time, rather then just trying to construct a Date and seeing if it's valid after the fact.

If you can't do that, you'll have to find additional context to help decide. For example, if you are scraping some random bits of the web from a back-end process and you want to extract a date from the text, you'd have to have some knowledge about the language and locale of each particular web page. You could guess, but you'd likely be wrong a fair amount of the time.

See also: Garbage in, garbage out

like image 45
Matt Johnson-Pint Avatar answered Oct 21 '22 02:10

Matt Johnson-Pint