I need to extract some data from a webpage with php. The part that I'm interested in is structured similarly to this:
<a href="somepath" target="fruit">apple</a>
<a href="somepath" target="animal">cat</a>
<a href="somepath" target="fruit">orange</a>
<a href="somepath" target="animal">dog</a>
<a href="somepath" target="fruit">mango</a>
<a href="somepath" target="animal">monkey</a>
First, I want to extract all fruits, and then all animals, so that I have them nicely grouped.
I figured out how to loop through all attribute values. Here's the code:
$dom = new DOMDocument();
$html = file_get_contents('example.html');
@$dom->loadHTML($html);
$a = $dom->getElementsByTagName('a');
for ($i; $i < $a->length; $i++) {
$attr = $a->item($i)->getAttribute('target');
echo $attr . "\n";
}
So I get:
fruit animal fruit animal fruit animal
I also found out how to get the elements' text content:
$a->item($i)->textContent
So, if included in loop and echoed, I get:
apple cat orange dog mango monkey
I feel like I'm very close, but I can't get what I want. I need something like this:
if ( target = "fruit") then give me "apple, orange, mango".
Can someone please point me in the right direction?
Thanks.
The DOMDocument::getElementById() function is an inbuilt function in PHP which is used to search for an element with a certain id. Parameters:This function accepts a single parameter $elementId which holds the id to search for. Return Value: This function returns the DOMElement or NULL if the element is not found.
To get all of the attributes of a DOM element: Use the getAttributeNames() method to get an array of the element's attribute names. Use the reduce() method to iterate over the array. On each iteration, add a new key/value pair containing the name and value of the attribute.
The getAttribute() method returns the value of an element's attribute.
use DOMXPath and queries:
$doc = new DOMDocument();
$doc->Load('yourFile.html');
$xpath = new DOMXPath($doc);
$fruits = $xpath->query("//a[@target='fruit']");
foreach($fruits as $fruit) {
// ...
}
$animals = $xpath->query("//a[@target='animal']");
foreach($animals as $animal) {
// ...
}
See this demo.
Just continue on target attributes which aren't fruit, and then add the textContent of the elements to an array.
$nodes = array();
for ($i; $i < $a->length; $i++) {
$attr = $a->item($i)->getAttribute('target');
if ($attr != 'fruit') {
continue;
}
$nodes[] = $a->item($i)->textContent;
}
$nodes now contains all the nodes of the elements which have their target attribute set to fruit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With