Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the HTML source through the WebBrowser control in C#

Tags:

c#

I tried to get HTML Source in the following way:

webBrowser1.Document.Body.OuterHtml;

but it does not work. For example, if the original HTML source is :

<html>
<body>
    <div>
        <ul>
            <li>
                <h3>
                    Manufacturer</h3>
            </li>
            <li><a href="/4566-6501_7-0.html?

filter=1000036_3808675_100021_10194772_">Sony </a>(44)</li>
            <li><a href="/4566-6501_7-0.html?

filter=1000036_108496_100021_10194772_">Nikon </a>(19)</li>
            <li><a href="/4566-6501_7-0.html?

filter=1000036_3808726_100021_10194772_">Panasonic </a>(37)</li>
            <li><a href="/4566-6501_7-0.html?

filter=1000036_3808769_100021_10194772_">Canon </a>(29)</li>
            <li><a href="/4566-6501_7-0.html?

filter=1000036_2913388_100021_10194772_">Olympus </a>(21)</li>
            <li class="seeAll"><a href="/4566-6501_7-0.html?

sa=1000036&filter=100021_10194772_" class="readMore">See all manufacturers </a></li>
        </ul>
    </div>
</body>
</html>

but the output of webBrowser1.Document.Body.OuterHtml is:

<body>
    <div>
        <ul>
            <li>
                <h3>
                    Manufacturer</h3>
                <li><a href="/4566-6501_7-0.html?filter=1000036_3808675_100021_10194772_">Sony </a>(44)
                    <li><a href="/4566-6501_7-0.html?filter=1000036_108496_100021_10194772_">Nikon </a>(19)
                        <li><a href="/4566-6501_7-0.html?filter=1000036_3808726_100021_10194772_">Panasonic
                        </a>(37)
                            <li><a href="/4566-6501_7-0.html?filter=1000036_3808769_100021_10194772_">Canon </a>
                                (29)
                                <li><a href="/4566-6501_7-0.html?filter=1000036_2913388_100021_10194772_">Olympus </a>
                                    (21)
                                    <li class="seeAll"><a class="readMore" href="/4566-6501_7-0.html?sa=1000036&amp;filter=100021_10194772_">
                                        See all manufacturers </a></li>
        </ul>
    </div>
</body>

as you can see, many </li> are lost.

is there a way to get HTML source in WebBrower control correctly? Note that in my application, I try to use WebBrowser to add coordinate info to every node and output its HTML source with coordinate info which is added as attributes of nodes.

anybody can do me a favor?

like image 260
Rockycqu Avatar asked Mar 02 '11 07:03

Rockycqu


2 Answers

If you want to grab the entire HTML source of the WebBrowser control then use this - WebBrowser1.Document.GetElementsByTagName("HTML").Item(0).OuterHtml. This of course assumes you have properly formatted HTML and the HTML tag exists. If you want to narrow it down to just the body then obviously change the HTML tag to the BODY tag. This way you grab any and all changes after "DocumentText" has been set. Sorry, I'm a VB guy, convert as needed ;)

like image 101
Justin Emlay Avatar answered Oct 20 '22 15:10

Justin Emlay


Try using DocumentText or DocumentStream properties.

like image 39
VinayC Avatar answered Oct 20 '22 15:10

VinayC