Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get HTML element by attribute value in php

Tags:

html

dom

php

I need to extract some data from a webpage with php. The part that I'm interested in is structured similarly to this:

<a href="somepath" target="fruit">apple</a>
<a href="somepath" target="animal">cat</a>
<a href="somepath" target="fruit">orange</a>
<a href="somepath" target="animal">dog</a>
<a href="somepath" target="fruit">mango</a>
<a href="somepath" target="animal">monkey</a>

First, I want to extract all fruits, and then all animals, so that I have them nicely grouped.

I figured out how to loop through all attribute values. Here's the code:

$dom = new DOMDocument();
$html = file_get_contents('example.html');

@$dom->loadHTML($html);

$a = $dom->getElementsByTagName('a');

for ($i; $i < $a->length; $i++) {
$attr = $a->item($i)->getAttribute('target');

echo $attr . "\n";
}

So I get:

fruit animal fruit animal fruit animal

I also found out how to get the elements' text content:

$a->item($i)->textContent

So, if included in loop and echoed, I get:

apple cat orange dog mango monkey

I feel like I'm very close, but I can't get what I want. I need something like this:

if ( target = "fruit") then give me "apple, orange, mango".

Can someone please point me in the right direction?

Thanks.

like image 925
stillenat Avatar asked Dec 06 '11 04:12

stillenat


People also ask

How to get DOM element in PHP?

The DOMDocument::getElementById() function is an inbuilt function in PHP which is used to search for an element with a certain id. Parameters:This function accepts a single parameter $elementId which holds the id to search for. Return Value: This function returns the DOMElement or NULL if the element is not found.

How can we fetch all attributes for an HTML element?

To get all of the attributes of a DOM element: Use the getAttributeNames() method to get an array of the element's attribute names. Use the reduce() method to iterate over the array. On each iteration, add a new key/value pair containing the name and value of the attribute.

Which java method we can use to fetch the value from HTML element?

The getAttribute() method returns the value of an element's attribute.


2 Answers

use DOMXPath and queries:

$doc = new DOMDocument();
$doc->Load('yourFile.html');

$xpath = new DOMXPath($doc);

$fruits = $xpath->query("//a[@target='fruit']");
foreach($fruits as $fruit) {
    // ...
}

$animals = $xpath->query("//a[@target='animal']");
foreach($animals as $animal) {
    // ...
}

See this demo.

like image 126
fardjad Avatar answered Oct 21 '22 01:10

fardjad


Just continue on target attributes which aren't fruit, and then add the textContent of the elements to an array.

$nodes = array();

for ($i; $i < $a->length; $i++) {
    $attr = $a->item($i)->getAttribute('target');

    if ($attr != 'fruit') {
        continue;
    }

    $nodes[] = $a->item($i)->textContent;
}

$nodes now contains all the nodes of the elements which have their target attribute set to fruit.

like image 37
alex Avatar answered Oct 21 '22 00:10

alex