How to save a web page into a HTML file with PowerShell or C#?

Tags:

I have the following link , and when I open the link via Chrome and then right-click the page and then choose "save as" to save the page into a HTML file (c:\temp\cu2.html)

enter image description here

After it is saved, I can open this cu2.html file with an HTML editor (say VS2015), and I can see inside the file, there is tag as seen below

enter image description here

However, if I open the link with IE11 (instead of Chrome), and then save the same page as HTML file, I cannot find this tag at all. Actually, the html file saved from IE11 is the same content as what I can extract with PowerShell script below.

#Requires -version 4.0
$url = 'https://support.microsoft.com/en-us/help/4052574/cumulative-update-2-for-sql-server-2017';

$wr = Invoke-WebRequest $url;
$wr.RawContent.contains('<table') # returns false

$wr.RawContent | out-file -FilePath c:\temp\cu2_ps.html -Force; #same as the file saved from the webpage to html file in IE

So my question is:

Why is a web page saved (as html file) in Chrome is different from that in IE?

How can I use PowerShell(or C#) to save such web page into a HTML file (same as the file saved in Chrome)?

Thanks in advance for your help.

779

asked Dec 01 '17 06:12

jyao

1 Answers

The pages uses AngularJS and also jQuery. It means some contents will be loaded after document ready. So when you send the request using Invoke-WebRequest, you only receive the original content of the page. Other contents will be loaded after a while.

To solve the problem, you can automate IE to get expected result. It's enough to wait fr the page to get ready and also wait a bit to run AngularJs logic and download required content, then get content of document element:

$ie = new-object -ComObject "InternetExplorer.Application"
$url = "https://support.microsoft.com/en-us/help/4052574/cumulative-update-2-for-sql-server-2017"
$ie.silent = $true
$ie.navigate($url)
while($ie.Busy) { Start-Sleep -Milliseconds 100 }
Start-Sleep 10
$ie.Document.documentElement.innerHTML > "C:\Tempfiles\output.html"
$ie.Stop()
$ie.Quit()

106

answered Oct 21 '22 21:10

Reza Aghaei

Related questions
                            
                                R Shiny Image without padding/ stretched across page using css
                            
                                How to prevent a semicolon from being entered into html text input, but allowing a colon?
                            
                                Change color when user has scrolled down enough and then back
                            
                                Background pixelated
                            
                                Bootstrap 4 SCSS overrides not working
                            
                                How to dynamically create a new div using v-for in Vue.js?
                            
                                keep keyboard open on Ionic when button click ( chat app )
                            
                                leaflet remove specific marker
                            
                                Using *ngFor in CSS Grid Layout Undesirably Displaying Everything in One Column
                            
                                How to generate an addition equation for a number using only required set of numbers?
                            
                                Why won't my XPath select link/button based on its label text?
                            
                                How to format HTML code in VScode ?
                            
                                select <li> that does not have <a>
                            
                                How to load an html webpage inside unity3d
                            
                                Click anywhere to close side navbar javascript
                            
                                speed up canvas's getImageData
                            
                                Round border with gradient color
                            
                                Remove HTML comments from Markdown file
                            
                                Applying transform:scale to just the background image
                            
                                Can't get rid of white space at bottom of my mobile version of website

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to save a web page into a HTML file with PowerShell or C#?

Tags:

html

powershell

google-chrome

web

jyao

People also ask

1 Answers

Reza Aghaei

Recent Activity

Donate For Us