Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use GetElementsByClassName in a script

Tags:

powershell

I'm trying to write a PowerShell script to get the text within all the classes named "newstitle" from a website.

This is what I have:

function check-krpano {
    $geturl=Invoke-WebRequest http://krpano.com/news/
    $news=$geturl.parsedhtml.body.GetElementsByClassName("newstitle")[0]
    Write-Host  "$news"
}

check-krpano

It obviously needs much more tweaking, but so far, it doesn't work.

I managed to write an script using GetElementById, but I don't know the syntax for GetElementsByClassName, and to be honest, I haven't been able to find much information about it.

NOTE:

I've ticked the right answer to my question, but that's not the solution that I had chose to use in my script.

Although I was able to find the content within a tag containing a certain class, using 2 methods, they were much slower that searching for links.

Here is the output using Measure-Command:

  • Search for divs containing class 'newstitle' using parsedhtml.body -> 29.6 seconds
  • Search for devs containing class 'newstitle' using Allelements -> 10.4 seconds
  • Search for links which its element 'href' contains #news -> 2.4 seconds

So I have marked as useful the Links method answer.

This is my final script:

function check-krpano {
    Clear-Host
    $geturl=Invoke-WebRequest http://krpano.com/news
    $news = ($geturl.Links |Where href -match '\#news\d+' | where class -NotMatch 'moreinfo+' )
    $news.outertext | Select-Object -First 5
}

check-krpano
like image 844
RafaelGP Avatar asked Jul 12 '13 23:07

RafaelGP


People also ask

What does getElementsByClassName () do in JavaScript?

The getElementsByClassName method of Document interface returns an array-like object of all child elements which have all of the given class name(s). When called on the document object, the complete document is searched, including the root node.

What does getElementsByClassName () function return?

The getElementsByClassName() method returns a collection of elements with a specified class name(s). The getElementsByClassName() method returns an HTMLCollection.

What is difference between getElementById and getElementsByClassName?

We want to get the unique element and allocate it in a variable this can be done by making use of getElementById. But when we want to get all the products elements and allocate them in a variable then basically we are using getElementByClassName.

Can I get element by class in JavaScript?

The JavaScript getElementsByClassName is used to get all the elements that belong to a particular class. When the JavaScript get element by class name method is called on the document object, it searches the complete document, including the root nodes, and returns an array containing all the elements.


2 Answers

If you figure out how to get GetElementsByClassName to work, I'd like to know. I just ran into this yesterday and ran out of time so I came up with a workaround:

$geturl.ParsedHtml.body.getElementsByTagName('div') | 
    Where {$_.getAttributeNode('class').Value -eq 'newstitle'}
like image 117
Keith Hill Avatar answered Sep 19 '22 12:09

Keith Hill


getElementsByClassName does not return an array directly but instead a proxy to the results via COM. As you have discovered, conversion to an array is not automatic with the [] operator. You can use the list evaluation syntax, @(), to force it to an array first so that you can access individual elements:

@($body.getElementsByClassName("foo"))[0].innerText

As an aside, conversion is performed automatically if you use the object pipeline, e.g.:

$body.getElementsByClassName("foo") | Select-Object -First 1

It is also performed automatically with the foreach construct:

foreach ($element in $body.getElementsByClassName("foo"))
{
    $element.innerText
}
like image 42
Don Cruickshank Avatar answered Sep 18 '22 12:09

Don Cruickshank