Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jQuerys $.trim(), bug or poorly written?

$.trim() uses the following RegExp to trim a string:

/^(\s|\u00A0)+|(\s|\u00A0)+$/g

As it turns out, this can be pretty ugly, Example:

var mystr = '    some test --          more text            new test                                         xxx';
mystr = mystr.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g, "");

This code hangs Firefox and Chrome, it just takes like forever. "mystr" contains whitespaces but mostly hex 160(A0) characters. This "problem" does only occur, if there is no prepending whitespace/A0, but somewhere within the string. I have no clue why this happens.

This expression:

/^[\n\r\t \xA0]+|[\n\r\t \xA0]$/g

just works fine in all tested scenarios. Maybe a better pattern for that?

Source: http://code.jquery.com/jquery-1.4.2.js

UPDATE

It looks like you can't copy&paste this example string, at some points those A0 characters are replaced. Firebug console will also replace the characters on pasting, you have to create your own string in a sepperate html file/editor to test this.

like image 624
jAndy Avatar asked Jun 28 '10 13:06

jAndy


People also ask

Is trim deprecated?

trim( str )Returns: Stringversion deprecated: 3.5. Description: Remove the whitespace from the beginning and end of a string.

What is trim jQuery?

jQuery trim() method The trim() method is used to remove the space, tabs, and all line breaks from the starting and end of the specified string. This method does not remove these characters if these whitespace characters are in the middle of the string. The commonly used syntax of using this method is given as follows.

How can remove space from string in jQuery?

Answer: Use the jQuery $. trim() function You can use the jQuery $. trim() function to remove all the spaces (including non-breaking spaces), newlines, and tabs from the beginning and end of the specified string. However, the whitespaces in the middle of the string are preserved.

What is trim () Javascript?

The trim() method removes whitespace from both ends of a string and returns a new string, without modifying the original string. Whitespace in this context is all the whitespace characters (space, tab, no-break space, etc.) and all the line terminator characters (LF, CR, etc.).


2 Answers

This is a known bug, as said in comments, and Crescent is right that it's this way in 1.4.2, but it's already fixed for the next release.

You can test the speed of String.prototype.trim on your string here: http://jsfiddle.net/dLLVN/
I get around 79ms in Chrome 117ms in Firefox for a million runs...so this will fix the hanging issue :)

As for the fix, take a look at the current source that'll be in 1.4.3, the native trimming is now used.

There were 2 commits in march for this:

  • http://github.com/jquery/jquery/commit/141ad3c3e21e7734e67e37b5fb39782fe11b3c18
  • http://github.com/jquery/jquery/commit/ba8938d444b9a49bdfb27213826ba108145c2e50

1.4.2 $.trim() function:

trim: function( text ) {
    return (text || "").replace( rtrim, "" );
},

1.4.3 $.trim() function:

//earlier: 
trim = String.prototype.trim

//new trim here
trim: trim ?
  function( text ) {
    return text == null ?
      "" : 
      trim.call( text ); 
  } :

  // Otherwise use our own trimming functionality
  function( text ) { 
    return text == null ? 
      "" :
      text.toString().replace( trimLeft, "" ).replace( trimRight, "" );
  }

The trimLeft and trimRight vary, depending on whether you're in IE or not, like this:

trimLeft = /^\s+/,
trimRight = /\s+$/,

// Verify that \s matches non-breaking spaces
// (IE fails on this test)
if ( !/\s/.test( "\xA0" ) ) {
  trimLeft = /^[\s\xA0]+/;
  trimRight = /[\s\xA0]+$/;
}
like image 137
Nick Craver Avatar answered Sep 27 '22 18:09

Nick Craver


Normally an expression like ^\s+|\s+$ should be enough for trimming, since \s is supposed to match all space characters, even \0xa0 non-breaking spaces1. This expression should run without causing any problems.

Now probably some browser that jQuery wants to support doesn't match \0xa0 with \s and to work around this problem jQuery added the alternative (\s|\0xa0), to trim away non-breaking spaces on that browser too.

With this change, the second part of the regex looks like (\s|\0xa0)+$, which leads to problems in browsers where \0xa0 is also matched by \s. In a string containing a long sequence of \0xa0 characters, each character can be matched by \s or \0xa0, leading to lots of alternative matches and exponentially many combinations how different matches can be combined. If this sequence of \0xa0 characters is not at the end of the string, the trailing $ condition can never be fulfilled, no matter which spaces are matched by \s and which are matched by \0xax, but the browser doesn't know this and tries all combinations, potentially searching for a very long time.

The simplified expression you suggest will not be sufficient since \s is supposed to match all unicode space characters, not just the well-known ASCII ones.


1 According to MDC, \s is equivalent to [\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]

like image 43
sth Avatar answered Sep 27 '22 18:09

sth