Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find text nodes in an HTML snippet?

Tags:

dom

php

I'm trying to parse an HTML snippet, using the PHP DOM functions. I have stripped out everything apart from paragraph, span and line break tags, and now I want to retrieve all the text, along with its accompanying styles.

So, I'd like to get each piece of text, one by one, and for each one I can then go back up the tree to get the values of particular attributes (I'm only interested in some specific ones, like color etc.).

How can I do this? Or am I thinking about it the wrong way?

like image 806
Sharon Avatar asked Jan 24 '11 12:01

Sharon


People also ask

How do you find the text node of an element?

text(); This gets the contents of the selected element, and applies a filter function to it. The filter function returns only text nodes (i.e. those nodes with nodeType == Node. TEXT_NODE ).

What is a text node in HTML?

A text node encapsulates XML character content. A text node can have zero or one parent. The content of a text node can be empty. However, unless the parent of a text node is empty, the content of the text node cannot be an empty string.

How do you get text tags in HTML?

Use the textContent property to get the text of an html element, e.g. const text = box. textContent . The textContent property returns the text content of the element and its descendants. If the element is empty, an empty string is returned.


1 Answers

Suppose you have a DOMDocument here:

$doc = new DOMDocument();
$doc->loadHTMLFile('http://stackoverflow.com/');

You can find all text nodes using a simple Xpath.

$xpath = new DOMXpath($doc);
$textNodes = $xpath->query('//text()');

Just foreach over it to iterate over all textnodes:

foreach ($textNodes as $textNode) {
    echo $textNode->data . "\n";
}

From that, you can go up the DOM tree by using ->parentNode.

Hope that this can give you a good start.

like image 74
Thai Avatar answered Nov 03 '22 01:11

Thai