I'm trying to write a PowerShell script to get the text within all the classes named "newstitle" from a website.
This is what I have:
function check-krpano {
$geturl=Invoke-WebRequest http://krpano.com/news/
$news=$geturl.parsedhtml.body.GetElementsByClassName("newstitle")[0]
Write-Host "$news"
}
check-krpano
It obviously needs much more tweaking, but so far, it doesn't work.
I managed to write an script using GetElementById, but I don't know the syntax for GetElementsByClassName, and to be honest, I haven't been able to find much information about it.
NOTE:
I've ticked the right answer to my question, but that's not the solution that I had chose to use in my script.
Although I was able to find the content within a tag containing a certain class, using 2 methods, they were much slower that searching for links.
Here is the output using Measure-Command:
So I have marked as useful the Links method answer.
This is my final script:
function check-krpano {
Clear-Host
$geturl=Invoke-WebRequest http://krpano.com/news
$news = ($geturl.Links |Where href -match '\#news\d+' | where class -NotMatch 'moreinfo+' )
$news.outertext | Select-Object -First 5
}
check-krpano
The getElementsByClassName method of Document interface returns an array-like object of all child elements which have all of the given class name(s). When called on the document object, the complete document is searched, including the root node.
The getElementsByClassName() method returns a collection of elements with a specified class name(s). The getElementsByClassName() method returns an HTMLCollection.
We want to get the unique element and allocate it in a variable this can be done by making use of getElementById. But when we want to get all the products elements and allocate them in a variable then basically we are using getElementByClassName.
The JavaScript getElementsByClassName is used to get all the elements that belong to a particular class. When the JavaScript get element by class name method is called on the document object, it searches the complete document, including the root nodes, and returns an array containing all the elements.
If you figure out how to get GetElementsByClassName to work, I'd like to know. I just ran into this yesterday and ran out of time so I came up with a workaround:
$geturl.ParsedHtml.body.getElementsByTagName('div') |
Where {$_.getAttributeNode('class').Value -eq 'newstitle'}
getElementsByClassName
does not return an array directly but instead a proxy to the results via COM. As you have discovered, conversion to an array is not automatic with the []
operator. You can use the list evaluation syntax, @()
, to force it to an array first so that you can access individual elements:
@($body.getElementsByClassName("foo"))[0].innerText
As an aside, conversion is performed automatically if you use the object pipeline, e.g.:
$body.getElementsByClassName("foo") | Select-Object -First 1
It is also performed automatically with the foreach
construct:
foreach ($element in $body.getElementsByClassName("foo"))
{
$element.innerText
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With