I would like to build an array from an HTML file using PowerShell.
I am using a script which download the HTML File from the Mozilla Firefox Developer Edition (I am downloading the index file) locally and I would like to parse it to get the value of the options elements inside the select element which have the id set to id_country.
I have been recommended to use XPath for that but I can't figure how to parse the file and build an array from the result. Maybe using regex could be a workaround.
The HTML file is here :
http://pastebin.com/b8cShFLA
And I would like to all the values of the options elements here:
<select aria-required="true" id="id_country" name="country" required="required">
<option value="af">Afghanistan</option>
<option value="al">Albania</option>
<option value="dz">Algeria</option>
<option value="as">American Samoa</option>
<option value="ad">Andorra</option>
...
I am quite new to PowerShell that's why I am not really aware of different solutions I might be able to use. I would need something quite fast as it's part of a package installer.
Basically the script will try to see if there is an installer which match the locale of the user's computer and if not it will default to english that's why I need to get the values from that list in order to check the firefox dev available locales.
Regards, O
I don't see a code sample to fix, so I'll make one.
If it was a remote html I would use Invoke-WebRequest, but that doesn't work too well with local files.
For parsing of local files I would recommend using HTML Agility Pack to parse the HTML file, and then use xPath to get the options you're looking for. Ex.
Add-Type -Path .\HTMLAgilityPack\HtmlAgilityPack.dll
$url = (get-item .\b8cShFLA.html).FullName
$doc = New-Object HtmlAgilityPack.HtmlDocument
$doc.LoadHtml((get-content $url))
#Create hashtable to store data in
$langs = @{}
$doc.DocumentNode.SelectSingleNode("//select[@name='country']").SelectNodes("option") | ForEach-Object {
$short = $_.Attributes[0].Value
$long = $_.NextSibling.InnerText
#Store data in hashtable
$langs[$short] = $long
}
$langs
Ouput:
Name Value
---- -----
rw Rwanda
tv Tuvalu
to Tonga
pn Pitcairn
bh Bahrain
lc Saint Lucia
If you're running PS 3.0 or above, you can take advantage of Invoke-WebRequest for pages that exist out on the web. If you're operating against a local file, it can be a bit finicky.
Invoke-WebRequest returns a HtmlWebResponseObject with a property called ParsedHtml. This object has a method named getElementById, which we can use since we know the id "id_country" on your select tag. From there, it is a simple matter to iterate the options tags and filter down to return the properties we would like... "Text" and "value".
The example below outputs a custom object containing the country name and the country code:
Code:
# I'm using your raw pastebin endpoint for this example
$result = Invoke-WebRequest "http://pastebin.com/raw.php?i=b8cShFLA"
# Only return specific properties from the elements you're looking for
$countries = $result.ParsedHtml.getElementById("id_country") |
Where tagName -eq "option" |
Select -Property Text, Value
# Country name and code are stored to this variable
$countries
Output:
text value
---- -----
Afghanistan af
Albania al
Algeria dz
American Samoa as
Andorra ad
... ...
You can then use the country name and code as you would any other property on powershell objects.
As for the web endpoint, it sounds like you could modify this script to point to the original Mozilla page you're extracting this HTML from?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With