I need to write a function which tests, if given string is "blank" in a sense that it only contains whitespace characters. Whitespace characters are the following:
'\u0009',
'\u000A',
'\u000B',
'\u000C',
'\u000D',
' ',
'\u0085',
'\u00A0',
'\u1680',
'\u180E',
'\u2000',
'\u2001',
'\u2002',
'\u2003',
'\u2004',
'\u2005',
'\u2006',
'\u2007',
'\u2008',
'\u2009',
'\u200A',
'\u2028',
'\u2029',
'\u202F',
'\u205F',
'\u3000'
The function will be called a lot of times, so it must be really, really performant. But shouldn't take too much memory (like mapping every character to true/false in an array). Things I've tried out so far:
if (!whitespaceCharactersMap[str[index]]) ...
) - works well enoughmy current solution uses hardcoded comparisons:
function(str) {
var length = str.length;
if (!length) {
return true;
}
for (var index = 0; index < length; index++)
{
var c = str[index];
if (c === ' ')
{
// skip
}
else if (c > '\u000D' && c < '\u0085')
{
return false;
}
else if (c < '\u00A0')
{
if (c < '\u0009')
{
return false;
}
else if (c > '\u0085')
{
return false;
}
}
else if (c > '\u00A0')
{
if (c < '\u2028')
{
if (c < '\u180E')
{
if (c < '\u1680')
{
return false;
}
else if(c > '\u1680')
{
return false;
}
}
else if (c > '\u180E')
{
if (c < '\u2000')
{
return false;
}
else if (c > '\u200A')
{
return false;
}
}
}
else if (c > '\u2029')
{
if (c < '\u205F')
{
if (c < '\u202F')
{
return false;
}
else if (c > '\u202F')
{
return false;
}
}
else if (c > '\u205F')
{
if (c < '\u3000')
{
return false;
}
else if (c > '\u3000')
{
return false;
}
}
}
}
}
return true;
}
This seems to work 50-100% faster than hash set (tested on Chrome).
Does anybody see or know further options?
Update 1
I'll answer some of the comments here:
Now here's my take on performance tests:
http://jsperf.com/hash-with-comparisons/6
I'd be grateful if you guys run these tests a couple of times.
Preliminary conclusions:
a^9*a^10*a^11...
) is extremely fast in Chrome and Firefox, but not in Safari. Probably the best choice for Node.js from performance perspective.' '
should be the first one).To sum up, for my case I'll opt to the following regexp version:
var re = /[^\s]/;
return !re.test(str);
Reasons:
Hard-coded solution seems the best, but I think switch
should be faster. It depends on the way JavaScript interpreter handles these (most compilers do this very efficiently), so it may be browser-specific (i.e., fast in some, slow in others). Also, I'm not sure how fast JavaScript is with UTF-strings, so you might try converting a character to its integer code before comparing the values.
for (var index = 0; index < length; index++)
{
var c = str.charCodeAt(index);
switch (c) {
case 0x0009: case 0x000A: case 0x000B: case 0x000C: case 0x000D: case 0x0020:
case 0x0085: case 0x00A0: case 0x1680: case 0x180E: case 0x2000: case 0x2001:
case 0x2002: case 0x2003: case 0x2004: case 0x2005: case 0x2006: case 0x2007:
case 0x2008: case 0x2009: case 0x200A: case 0x2028: case 0x2029: case 0x202F:
case 0x205F: case 0x3000: continue;
}
return false;
}
Another thing to consider is changing for
:
for (var index in str)
{
...
}
Edit
Your jsPerf test got some revisions, the current one available here. My code is significantly faster in Chrome 26 and 27, and in IE10, but it's also the slowest one in Firefox 18.
I ran the same test (I don't know how to make jsPerf save those) on Firefox 20.0 on 64-bit Linux and it turned out to be one of the two fastest ones (tied with trimTest
, both at about 11.8M ops/sec). I also tested Firefox 20.0.1 on WinXP, but under a VirtualBox (still under 64bit Linux, which might make a significant difference here), which gave 10M ops/sec to switchTest
, with trimTest
coming second at 7.3M ops/sec.
So, I'm guessing that the performance depends on the browser version and/or maybe even on the underlying OS/hardware (I suppose the above FF18 test was on Win). In any case, to make a truly optimal version, you'll have to make many versions, test each on all browsers, OSes, architectures,... you can get a hold of, and then include in your page the version best suited for the visitor's browser, OS, architecture,... I'm not sure what kind of code is worth the trouble, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With