Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The most performant way to check if a string is blank (i.e. only contains whitespace) in JavaScript?

I need to write a function which tests, if given string is "blank" in a sense that it only contains whitespace characters. Whitespace characters are the following:

'\u0009',
'\u000A',
'\u000B',
'\u000C',
'\u000D',
' ',
'\u0085',
'\u00A0',
'\u1680',
'\u180E',
'\u2000',
'\u2001',
'\u2002',
'\u2003',
'\u2004',
'\u2005',
'\u2006',
'\u2007',
'\u2008',
'\u2009',
'\u200A',
'\u2028',
'\u2029',
'\u202F',
'\u205F',
'\u3000'

The function will be called a lot of times, so it must be really, really performant. But shouldn't take too much memory (like mapping every character to true/false in an array). Things I've tried out so far:

  • regexp - not quite performant
  • trim and check if length is 0 - not quite performant, also uses additional memory to hold the trimmed string
  • checking every string character against a hash set containing whitespace characters (if (!whitespaceCharactersMap[str[index]]) ...) - works well enough
  • my current solution uses hardcoded comparisons:

    function(str) {
        var length = str.length;
        if (!length) {
            return true;
        }
        for (var index = 0; index < length; index++)
        {
            var c = str[index];
            if (c === ' ')
            {
                // skip
            }
            else if (c > '\u000D' && c < '\u0085')
            {
                return false;
            }
            else if (c < '\u00A0')
            {
                if (c < '\u0009')
                {
                    return false;
                }
                else if (c > '\u0085')
                {
                    return false;
                }
            }
            else if (c > '\u00A0')
            {
                if (c < '\u2028')
                {
                    if (c < '\u180E')
                    {
                        if (c < '\u1680')
                        {
                            return false;
                        }
                        else if(c > '\u1680')
                        {
                            return false;
                        }
                    }
                    else if (c > '\u180E')
                    {
                        if (c < '\u2000')
                        {
                            return false;
                        }
                        else if (c > '\u200A')
                        {
                            return false;
                        }
                    }
                }
                else if (c > '\u2029')
                {
                    if (c < '\u205F')
                    {
                        if (c < '\u202F')
                        {
                            return false;
                        }
                        else if (c > '\u202F')
                        {
                            return false;
                        }
                    }
                    else if (c > '\u205F')
                    {
                        if (c < '\u3000')
                        {
                            return false;
                        }
                        else if (c > '\u3000')
                        {
                            return false;
                        }
                    }
                }
            }
        }
        return true;
    }
    

This seems to work 50-100% faster than hash set (tested on Chrome).

Does anybody see or know further options?

Update 1

I'll answer some of the comments here:

  • It's not just checking user input for emptyness. I have to parse certain data format where whitespace must be handled separately.
  • It is worth optimizing. I've profiled the code before. Checking for blank strings seems to be an issue. And, as we saw, the difference in performance between approaches can be up to 10 times, it's definitely worth the effort.
  • Generally, I find this "hash set vs. regex vs. switch vs. branching" challenge very educating.
  • I need the same functionality for browsers as well as node.js.

Now here's my take on performance tests:

http://jsperf.com/hash-with-comparisons/6

I'd be grateful if you guys run these tests a couple of times.

Preliminary conclusions:

  • branchlessTest (a^9*a^10*a^11...) is extremely fast in Chrome and Firefox, but not in Safari. Probably the best choice for Node.js from performance perspective.
  • switchTest is also quite fast on Chrom and Firefox, but, surprizingly the slowest in Safari and Opera
  • Regexps with re.test(str) perform well everywhere, even fastest in Opera.
  • Hash and branching show almost identically poor results almost everywhere. Comparision is also similar, often worst performance (this may be due to the implementation, check for ' ' should be the first one).

To sum up, for my case I'll opt to the following regexp version:

var re = /[^\s]/;
return !re.test(str);

Reasons:

  • branchless version is cool in Chrome and Firefox but isn't quite portable
  • switch is too slow in Safari
  • regexps seem to perform well everywhere, they'll also very compact in code
like image 784
lexicore Avatar asked Jun 09 '13 13:06

lexicore


1 Answers

Hard-coded solution seems the best, but I think switch should be faster. It depends on the way JavaScript interpreter handles these (most compilers do this very efficiently), so it may be browser-specific (i.e., fast in some, slow in others). Also, I'm not sure how fast JavaScript is with UTF-strings, so you might try converting a character to its integer code before comparing the values.

for (var index = 0; index < length; index++)
{
    var c = str.charCodeAt(index);
    switch (c) {
        case 0x0009: case 0x000A: case 0x000B: case 0x000C: case 0x000D: case 0x0020:
        case 0x0085: case 0x00A0: case 0x1680: case 0x180E: case 0x2000: case 0x2001:
        case 0x2002: case 0x2003: case 0x2004: case 0x2005: case 0x2006: case 0x2007:
        case 0x2008: case 0x2009: case 0x200A: case 0x2028: case 0x2029: case 0x202F:
        case 0x205F: case 0x3000: continue;
    }
    return false;
}

Another thing to consider is changing for:

for (var index in str)
{
    ...
}

Edit

Your jsPerf test got some revisions, the current one available here. My code is significantly faster in Chrome 26 and 27, and in IE10, but it's also the slowest one in Firefox 18.

I ran the same test (I don't know how to make jsPerf save those) on Firefox 20.0 on 64-bit Linux and it turned out to be one of the two fastest ones (tied with trimTest, both at about 11.8M ops/sec). I also tested Firefox 20.0.1 on WinXP, but under a VirtualBox (still under 64bit Linux, which might make a significant difference here), which gave 10M ops/sec to switchTest, with trimTest coming second at 7.3M ops/sec.

So, I'm guessing that the performance depends on the browser version and/or maybe even on the underlying OS/hardware (I suppose the above FF18 test was on Win). In any case, to make a truly optimal version, you'll have to make many versions, test each on all browsers, OSes, architectures,... you can get a hold of, and then include in your page the version best suited for the visitor's browser, OS, architecture,... I'm not sure what kind of code is worth the trouble, though.

like image 112
Vedran Šego Avatar answered Sep 28 '22 04:09

Vedran Šego