I'm trying to parse an HTML snippet, using the PHP DOM functions. I have stripped out everything apart from paragraph, span and line break tags, and now I want to retrieve all the text, along with its accompanying styles.
So, I'd like to get each piece of text, one by one, and for each one I can then go back up the tree to get the values of particular attributes (I'm only interested in some specific ones, like color etc.).
How can I do this? Or am I thinking about it the wrong way?
text(); This gets the contents of the selected element, and applies a filter function to it. The filter function returns only text nodes (i.e. those nodes with nodeType == Node. TEXT_NODE ).
A text node encapsulates XML character content. A text node can have zero or one parent. The content of a text node can be empty. However, unless the parent of a text node is empty, the content of the text node cannot be an empty string.
Use the textContent property to get the text of an html element, e.g. const text = box. textContent . The textContent property returns the text content of the element and its descendants. If the element is empty, an empty string is returned.
Suppose you have a DOMDocument here:
$doc = new DOMDocument();
$doc->loadHTMLFile('http://stackoverflow.com/');
You can find all text nodes using a simple Xpath.
$xpath = new DOMXpath($doc);
$textNodes = $xpath->query('//text()');
Just foreach
over it to iterate over all textnodes:
foreach ($textNodes as $textNode) {
echo $textNode->data . "\n";
}
From that, you can go up the DOM tree by using ->parentNode
.
Hope that this can give you a good start.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With