I have the following link , and when I open the link via Chrome and then right-click the page and then choose "save as" to save the page into a HTML file (c:\temp\cu2.html)
After it is saved, I can open this cu2.html file with an HTML editor (say VS2015), and I can see inside the file, there is tag as seen below
However, if I open the link with IE11 (instead of Chrome), and then save the same page as HTML file, I cannot find this tag at all. Actually, the html file saved from IE11 is the same content as what I can extract with PowerShell script below.
#Requires -version 4.0
$url = 'https://support.microsoft.com/en-us/help/4052574/cumulative-update-2-for-sql-server-2017';
$wr = Invoke-WebRequest $url;
$wr.RawContent.contains('<table') # returns false
$wr.RawContent | out-file -FilePath c:\temp\cu2_ps.html -Force; #same as the file saved from the webpage to html file in IE
So my question is:
Why is a web page saved (as html file) in Chrome is different from that in IE?
How can I use PowerShell(or C#) to save such web page into a HTML file (same as the file saved in Chrome)?
Thanks in advance for your help.
Press CTRL+S. Right-click within the HTML document, click File > Save.
PowerShell provides a built-in cmdlet called ConvertTo-Html. This takes objects as input and converts each of them to an HTML web page. To use this, just take the output and pipe it directly to ConvertTo-Html. The cmdlet will then return a big string of HTML.
The pages uses AngularJS and also jQuery. It means some contents will be loaded after document ready. So when you send the request using Invoke-WebRequest
, you only receive the original content of the page. Other contents will be loaded after a while.
To solve the problem, you can automate IE to get expected result. It's enough to wait fr the page to get ready and also wait a bit to run AngularJs logic and download required content, then get content of document element:
$ie = new-object -ComObject "InternetExplorer.Application"
$url = "https://support.microsoft.com/en-us/help/4052574/cumulative-update-2-for-sql-server-2017"
$ie.silent = $true
$ie.navigate($url)
while($ie.Busy) { Start-Sleep -Milliseconds 100 }
Start-Sleep 10
$ie.Document.documentElement.innerHTML > "C:\Tempfiles\output.html"
$ie.Stop()
$ie.Quit()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With