I tried to get HTML Source in the following way:
webBrowser1.Document.Body.OuterHtml;
but it does not work. For example, if the original HTML source is :
<html>
<body>
<div>
<ul>
<li>
<h3>
Manufacturer</h3>
</li>
<li><a href="/4566-6501_7-0.html?
filter=1000036_3808675_100021_10194772_">Sony </a>(44)</li>
<li><a href="/4566-6501_7-0.html?
filter=1000036_108496_100021_10194772_">Nikon </a>(19)</li>
<li><a href="/4566-6501_7-0.html?
filter=1000036_3808726_100021_10194772_">Panasonic </a>(37)</li>
<li><a href="/4566-6501_7-0.html?
filter=1000036_3808769_100021_10194772_">Canon </a>(29)</li>
<li><a href="/4566-6501_7-0.html?
filter=1000036_2913388_100021_10194772_">Olympus </a>(21)</li>
<li class="seeAll"><a href="/4566-6501_7-0.html?
sa=1000036&filter=100021_10194772_" class="readMore">See all manufacturers </a></li>
</ul>
</div>
</body>
</html>
but the output of webBrowser1.Document.Body.OuterHtml
is:
<body>
<div>
<ul>
<li>
<h3>
Manufacturer</h3>
<li><a href="/4566-6501_7-0.html?filter=1000036_3808675_100021_10194772_">Sony </a>(44)
<li><a href="/4566-6501_7-0.html?filter=1000036_108496_100021_10194772_">Nikon </a>(19)
<li><a href="/4566-6501_7-0.html?filter=1000036_3808726_100021_10194772_">Panasonic
</a>(37)
<li><a href="/4566-6501_7-0.html?filter=1000036_3808769_100021_10194772_">Canon </a>
(29)
<li><a href="/4566-6501_7-0.html?filter=1000036_2913388_100021_10194772_">Olympus </a>
(21)
<li class="seeAll"><a class="readMore" href="/4566-6501_7-0.html?sa=1000036&filter=100021_10194772_">
See all manufacturers </a></li>
</ul>
</div>
</body>
as you can see, many </li>
are lost.
is there a way to get HTML source in WebBrower
control correctly? Note that in my application, I try to use WebBrowser
to add coordinate info to every node and output its HTML source with coordinate info which is added as attributes of nodes.
anybody can do me a favor?
If you want to grab the entire HTML source of the WebBrowser control then use this - WebBrowser1.Document.GetElementsByTagName("HTML").Item(0).OuterHtml. This of course assumes you have properly formatted HTML and the HTML tag exists. If you want to narrow it down to just the body then obviously change the HTML tag to the BODY tag. This way you grab any and all changes after "DocumentText" has been set. Sorry, I'm a VB guy, convert as needed ;)
Try using DocumentText or DocumentStream properties.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With