i get some HTML it a as ajax response, and i need to get just the body contents. So i made this regex:
/(<body>|<\/body>)/ig
works well in all browser but for some reason IE gives me an other array when i use split:
data.split(/(<body>|<\/body>)/ig)
In all normal browsers the content of the body is split(/(<body>|<\/body>)/ig)[2]
but in ie its in split(/(<body>|<\/body>)/ig)[1]
. (tested in IE7 & 8)
Why is this? And how could i modify it, in order to get the same array in all browsers?
edit just to clarify. I alrady have a solution as mentioned by tobyodavies. I want to understandy, why it behaves differently.
this is the HTML from the response: (the string in data)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de" dir="ltr">
<head>
blablabla...
</head>
<body>
<div class="iframe">
<div id="block-menu-menu-primary-links-user" class="block-menu">
<h3>Primary Links - User</h3> <div class="content"><ul class="menu"><li class="leaf first"><a target="content" href="#someurl" title="">Login</a></li>
<li class="leaf last"><a target="content" href="#someurl" title="">Register</a></li>
</ul></div>
</div>
</div>
</body>
</html>
PS: i know that parsing HTML with regex is bad, but its not my code, i just need to fix it.
The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.
You also need to use regex \\ to match "\" (back-slash). Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
Split by regex: re. If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module. In re. split() , specify the regex pattern in the first parameter and the target character string in the second parameter.
split() method split the string by the occurrences of the regex pattern, returning a list containing the resulting substrings.
The reason it behaves differently is because of the subexpression capture you have using parenthesis. Other browsers add the match inside these captures to the resulting array, IE 8 and lower do not. To get a more consistent result, you'd have to make the group non-capturing:
/(?:<body>|<\/body>)/ig
This is the reason other browsers have the content in [2]
rather than [1]
— [1]
will, in theory, contain the string "<body>"
. The other browsers have it right on this one and Internet Explorer 9 fixed the problem by implementing the method as outlined by the ECMAScript 5th Edition specification.
There are more inconsistencies than this, though. ECMAScript 5 compliance in all browsers will resolve these differences, but you might want to take a look at Steven Levithan's blog, where he outlines the differing implementations and even provides a custom split()
method as a solution to the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With