I can use PowerShell to parse an HTML page
PS > $foo = Invoke-WebRequest http://example.com
PS > $foo.Links.Count
1
However if I download the page
PS > Invoke-WebRequest -OutFile example.htm http://example.com
and then try to parse the downloaded page it gives unexpected result
PS > $foo = Invoke-WebRequest file://$pwd/example.htm
PS > $foo.Links.Count
0
How can I parse the local downloaded page?
Most computers will associate your default browser with the . html file extension. That means that normally, you don't need to find a browser to open the file—you can just double-click on it, and the computer will open it in your default web browser.
HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.
It appears that Invoke-WebRequest
loads file
protocol URIs just fine, but fails to parse them even in PowerShell 4.0 (where it is officially supported).
An alternative that does not require setting up a website would be to load and parse HTML directly into MSHTML.
$html = New-Object -ComObject "HTMLFile";
$source = Get-Content -Path "file.html" -Raw;
$html.IHTMLDocument2_write($source);
$html.links.length;
Note that when I tested this, a single
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
header prevented my HTML from parsing and I have no idea why -- the document had similar XHTML-style headers and MSHTML had no issues with those.
You can use the file with a web server to get around the dumb limitation of Invoke-WebRequest
PS > $foo = Invoke-WebRequest http://localhost:8080/example.htm
PS > $foo.Links.Count
1
Note this will work even with no connection, example
PS > Invoke-WebRequest http://example.com Invoke-WebRequest : The remote name could not be resolved: 'example.com'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With