Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

different split Regex result in IE

i get some HTML it a as ajax response, and i need to get just the body contents. So i made this regex:

/(<body>|<\/body>)/ig

works well in all browser but for some reason IE gives me an other array when i use split:

data.split(/(<body>|<\/body>)/ig)

In all normal browsers the content of the body is split(/(<body>|<\/body>)/ig)[2] but in ie its in split(/(<body>|<\/body>)/ig)[1]. (tested in IE7 & 8)

Why is this? And how could i modify it, in order to get the same array in all browsers?

edit just to clarify. I alrady have a solution as mentioned by tobyodavies. I want to understandy, why it behaves differently.

this is the HTML from the response: (the string in data)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"  xml:lang="de"  lang="de" dir="ltr">
<head>
blablabla...
</head>
<body>
<div class="iframe">
   <div id="block-menu-menu-primary-links-user" class="block-menu">
 <h3>Primary Links - User</h3>  <div class="content"><ul class="menu"><li class="leaf first"><a target="content" href="#someurl" title="">Login</a></li>
<li class="leaf last"><a target="content" href="#someurl" title="">Register</a></li>
</ul></div>
</div>
</div>
</body>
</html>

PS: i know that parsing HTML with regex is bad, but its not my code, i just need to fix it.

like image 306
meo Avatar asked Apr 04 '11 09:04

meo


People also ask

Can Split have multiple separators?

The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.

What is the use of \\ in regex?

You also need to use regex \\ to match "\" (back-slash). Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.

Can you split with regex?

Split by regex: re. If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module. In re. split() , specify the regex pattern in the first parameter and the target character string in the second parameter.

How do you split a string by the occurrences of a regex pattern?

split() method split the string by the occurrences of the regex pattern, returning a list containing the resulting substrings.


1 Answers

The reason it behaves differently is because of the subexpression capture you have using parenthesis. Other browsers add the match inside these captures to the resulting array, IE 8 and lower do not. To get a more consistent result, you'd have to make the group non-capturing:

/(?:<body>|<\/body>)/ig

This is the reason other browsers have the content in [2] rather than [1][1] will, in theory, contain the string "<body>". The other browsers have it right on this one and Internet Explorer 9 fixed the problem by implementing the method as outlined by the ECMAScript 5th Edition specification.

There are more inconsistencies than this, though. ECMAScript 5 compliance in all browsers will resolve these differences, but you might want to take a look at Steven Levithan's blog, where he outlines the differing implementations and even provides a custom split() method as a solution to the problem.

like image 172
Andy E Avatar answered Oct 05 '22 20:10

Andy E