I need to extract some data from a webpage with php. The part that I'm interested in is structured similarly to this:
<a href="somepath" target="fruit">apple</a>
<a href="somepath" target="animal">cat</a>
<a href="somepath" target="fruit">orange</a>
<a href="somepath" target="animal">dog</a>
<a href="somepath" target="fruit">mango</a>
<a href="somepath" target="animal">monkey</a>
First, I want to extract all fruits, and then all animals, so that I have them nicely grouped.
I figured out how to loop through all attribute values. Here's the code:
$dom = new DOMDocument();
$html = file_get_contents('example.html');
@$dom->loadHTML($html);
$a = $dom->getElementsByTagName('a');
for ($i; $i < $a->length; $i++) {
$attr = $a->item($i)->getAttribute('target');
echo $attr . "\n";
}
So I get:
fruit animal fruit animal fruit animal
I also found out how to get the elements' text content:
$a->item($i)->textContent
So, if included in loop and echoed, I get:
apple cat orange dog mango monkey
I feel like I'm very close, but I can't get what I want. I need something like this:
if ( target = "fruit") then give me "apple, orange, mango".
Can someone please point me in the right direction?
Thanks.
The DOMDocument::getElementById() function is an inbuilt function in PHP which is used to search for an element with a certain id. Parameters:This function accepts a single parameter $elementId which holds the id to search for. Return Value: This function returns the DOMElement or NULL if the element is not found.
To get all of the attributes of a DOM element: Use the getAttributeNames() method to get an array of the element's attribute names. Use the reduce() method to iterate over the array. On each iteration, add a new key/value pair containing the name and value of the attribute.
The getAttribute() method returns the value of an element's attribute.
use DOMXPath
and queries:
$doc = new DOMDocument();
$doc->Load('yourFile.html');
$xpath = new DOMXPath($doc);
$fruits = $xpath->query("//a[@target='fruit']");
foreach($fruits as $fruit) {
// ...
}
$animals = $xpath->query("//a[@target='animal']");
foreach($animals as $animal) {
// ...
}
See this demo.
Just continue
on target
attributes which aren't fruit
, and then add the textContent
of the elements to an array.
$nodes = array();
for ($i; $i < $a->length; $i++) {
$attr = $a->item($i)->getAttribute('target');
if ($attr != 'fruit') {
continue;
}
$nodes[] = $a->item($i)->textContent;
}
$nodes
now contains all the nodes of the elements which have their target
attribute set to fruit
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With