<p>I am trying to parse this html through jQuery to get data1, data2, data3. While I do get data2 and data3 I am unable to get data3 with my approach. I am fairly new to jQuery so please pardon my ignorance.</p> <pre class="prettyprint"><code><html> <body> <div class="class0"> <h4>data1</h4> <p class="class1">data2</p> <div id="mydivid"><p>data3</p></div> </div> </body> </html> </code></pre> <p>Here is how I am calling this in my jquery.</p> <pre class="prettyprint"><code>var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>"; alert($(datahtml).find(".class0").text()); // Doesn't Work alert($(datahtml).find(".class1").text()); // work alert($(datahtml).find("#mydivid").text()); // work </code></pre> <p>Only <code>alert($(datahtml).find(".class0").text());</code> is not working the rest are working as expected. I am wondering it may be because class0 has multiple tag inside it or what?? How to get data1 in such scenario?</p>

<p>Its behaviour is weird as it igonores the html and body tag and start from first div with class = "class0". The html is parsed as DOM elements but not added to DOM. For elements added to DOM the selector does not ignore body tag and apply selectors on document. You need to add the html to DOM as given below.</p> <p><strong>Live Demo</strong></p> <pre class="prettyprint"><code>$('#div1').append($(datahtml)); //Add in DOM before applying jquery methods. alert($('#div1').find(".class0").text()); // Now it Works too alert($('#div1').find(".class1").text()); // work alert($('#div1').find("#mydivid").text()); // work </code></pre> <p>If we wrap your html within some html element to make it starting point instead of your first div with class="class0" then your selector will work as expected.</p> <p><strong>Live Demo</strong></p> <pre class="prettyprint"><code>var datahtml = "<html><body><div><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></div></body></html>"; alert($(datahtml).find(".class0").text()); // Now it Works too alert($(datahtml).find(".class1").text()); // work alert($(datahtml).find("#mydivid").text()); // work </code></pre> <p>What jQuery docs say about the jQuery parsing function jQuery() i.e. $()</p> <blockquote> <p>When passing in complex HTML, some browsers may not generate a DOM that exactly replicates the HTML source provided. As mentioned, jQuery uses the browser"s .innerHTML property to parse the passed HTML and insert it into the current document. During this process, some browsers filter out certain elements such as <code><html></code>, <code><title></code>, or <code><head></code> elements. As a result, the elements inserted may not be representative of the original string passed.</p> </blockquote>

<p>I think I have an even better way:</p> <p>let's say you've got your html:</p> <pre class="prettyprint"><code>var htmlText = '<html><body><div class="class0"><h4>data1</h4><p class="class1">data2</p><div id="mydivid"><p>data3</p></div></div></body></html>' </code></pre> <p>Here's the thing you've been hoping to do:</p> <pre class="prettyprint"><code>var dataHtml = $($.parseXML(htmlText)).children('html'); </code></pre> <p><code>dataHtml</code> now works exactly like the ordinary jquery objects you're familiar with!!</p> <p>The wonderful thing about this solution is that it will not strip body, head, or script tags!</p>

Parsing of html string using jquery

Tags:

jquery

I am trying to parse this html through jQuery to get data1, data2, data3. While I do get data2 and data3 I am unable to get data3 with my approach. I am fairly new to jQuery so please pardon my ignorance.

<html>
<body>
   <div class="class0">
    <h4>data1</h4>
    <p class="class1">data2</p>
    <div id="mydivid"><p>data3</p></div>    
   </div>
</body>
</html>

Here is how I am calling this in my jquery.

var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";

alert($(datahtml).find(".class0").text()); // Doesn't Work

alert($(datahtml).find(".class1").text()); // work 

alert($(datahtml).find("#mydivid").text()); // work

Only alert($(datahtml).find(".class0").text()); is not working the rest are working as expected. I am wondering it may be because class0 has multiple tag inside it or what?? How to get data1 in such scenario?

902

asked Oct 09 '12 21:10

lazyguy

3 Answers

None of the current answers addressed the real issue, so I'll give it a go.

var datahtml = "<html><body><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></body></html>";  console.log($(datahtml));

$(datahtml) is a jQuery object containing only the div.class0 element, thus when you call .find on it, you're actually looking for descendants of div.class0 instead of the whole HTML document that you'd expect.

A quick solution is to wrap the parsed data in an element so the .find will work as intended:

var parsed = $('<div/>').append(datahtml); console.log(parsed.find(".class0").text());

Fiddle

The reason for this isn't very simple, but I assume that as jQuery does "parsing" of more complex html strings by simply dropping your HTML string into a separate created-on-the-fly DOM fragment and then retrieves the parsed elements, this operation would most likely make the DOM parser ignore the html and body tags as they would be illegal in this case.

Here is a very small test suite which demonstrates that this behavior is consistent through jQuery 1.8.2 all the way down to 1.6.4.

Edit: quoting this post:

Problem is that jQuery creates a DIV and sets innerHTML and then takes DIV children, but since BODY and HEAD elements are not valid DIV childs, then those are not created by browser.

Makes me more confident that my theory is correct. I'll share it here, hopefully it makes some sense for you. Have the jQuery 1.8.2's uncompressed source side by side with this. The # indicates line numbers.

All document fragments made through jQuery.buildFragment (defined @#6122) will go through jQuery.clean (#6151) (even if it is a cached fragment, it already went through the jQuery.clean when it was created), and as the quoted text above implies, jQuery.clean (defined @#6275) creates a fresh div inside the safe fragment to serve as container for the parsed data - div element created at #6301-6303, childNodes retrieved at #6344, div removed at #6347 for cleaning up (plus #6359-6361 as bug fix), childNodes merged into the return array at #6351-6355 and returned at #6406.

Therefore, all methods that invoke jQuery.buildFragment, which include jQuery.parseHTML and jQuery.fn.domManip - among those are .append(), .after(), .before() which invoke the domManip jQuery object method, and the $(html) which is handled at jQuery.fn.init (defined @#97, handling of complex [more than a single tag] html strings @#125, invokes jQuery.parseHTML @#131).

It makes sense that virtually all jQuery HTML strings parsing (besides single tag html strings) is done using a div element as container, and html/body tags are not valid descendants of a div element so they are stripped out.

Addendum: Newer versions of jQuery (1.9+) have refactored the HTML parsing logic (for instance, the internal jQuery.clean method no longer exists), but the overall parsing logic remains the same.

119

answered Oct 05 '22 21:10

Fabrício Matté

Its behaviour is weird as it igonores the html and body tag and start from first div with class = "class0". The html is parsed as DOM elements but not added to DOM. For elements added to DOM the selector does not ignore body tag and apply selectors on document. You need to add the html to DOM as given below.

Live Demo

$('#div1').append($(datahtml)); //Add in DOM before applying jquery methods.  alert($('#div1').find(".class0").text()); // Now it Works too  alert($('#div1').find(".class1").text()); // work     alert($('#div1').find("#mydivid").text()); // work

If we wrap your html within some html element to make it starting point instead of your first div with class="class0" then your selector will work as expected.

Live Demo

var datahtml = "<html><body><div><div class=\"class0\"><h4>data1</h4><p class=\"class1\">data2</p><div id=\"mydivid\"><p>data3</p></div></div></div></body></html>";  alert($(datahtml).find(".class0").text()); // Now it Works too  alert($(datahtml).find(".class1").text()); // work     alert($(datahtml).find("#mydivid").text()); // work

What jQuery docs say about the jQuery parsing function jQuery() i.e. $()

When passing in complex HTML, some browsers may not generate a DOM that exactly replicates the HTML source provided. As mentioned, jQuery uses the browser"s .innerHTML property to parse the passed HTML and insert it into the current document. During this process, some browsers filter out certain elements such as <html>, <title>, or <head> elements. As a result, the elements inserted may not be representative of the original string passed.

answered Oct 05 '22 21:10

Adil

I think I have an even better way:

let's say you've got your html:

var htmlText = '<html><body><div class="class0"><h4>data1</h4><p class="class1">data2</p><div id="mydivid"><p>data3</p></div></div></body></html>'

Here's the thing you've been hoping to do:

var dataHtml = $($.parseXML(htmlText)).children('html');

dataHtml now works exactly like the ordinary jquery objects you're familiar with!!

The wonderful thing about this solution is that it will not strip body, head, or script tags!

answered Oct 05 '22 23:10

Gershom Maes

Related questions
                            
                                jQuery Validation - error placement
                            
                                Chosen plugin doesn't seem to work on mobile browsers [closed]
                            
                                jQuery - Click event on <tr> elements with in a table and getting <td> element values
                            
                                Return a value when using jQuery.each()?
                            
                                Creating a CSS 'path' on hover
                            
                                When should I use return false in jquery function?
                            
                                Select2 start with input field instead of dropdown
                            
                                How to delay execution in between the following in my javascript
                            
                                How to use jquery in google chrome extension page action background.js?
                            
                                How do I refresh a DIV content?
                            
                                How to write ternary operator condition in jQuery?
                            
                                jQuery drag and drop - how to get at element being dragged
                            
                                How to get a style attribute from a CSS class by javascript/jQuery?
                            
                                Auto resizing the SELECT element according to selected OPTION's width
                            
                                How to get the selected value of the radio button list in jquery?
                            
                                Jquery - sort DIV's by innerHTML of children
                            
                                if div has content show div
                            
                                jQuery - Select first cell of a given row?
                            
                                jQuery .click() is triggering when selecting/highlighting text
                            
                                Convert CSV data into JSON format using Javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With