I'm trying to parse an HTML snippet, using the PHP DOM functions. I have stripped out everything apart from paragraph, span and line break tags, and now I want to retrieve all the text, along with its accompanying styles. So, I'd like to get each piece of text, one by one, and for each one I can then go back up the tree to get the values of particular attributes (I'm only interested in some specific ones, like color etc.). How can I do this? Or am I thinking about it the wrong way?

Suppose you have a DOMDocument here: <pre class="prettyprint"><code>$doc = new DOMDocument(); $doc->loadHTMLFile('http://stackoverflow.com/'); </code></pre> You can find all text nodes using a simple Xpath. <pre class="prettyprint"><code>$xpath = new DOMXpath($doc); $textNodes = $xpath->query('//text()'); </code></pre> Just <code>foreach</code> over it to iterate over all textnodes: <pre class="prettyprint"><code>foreach ($textNodes as $textNode) { echo $textNode->data . "\n"; } </code></pre> From that, you can go up the DOM tree by using <code>->parentNode</code>. Hope that this can give you a good start.

How can I find text nodes in an HTML snippet?

Tags:

dom

php

I'm trying to parse an HTML snippet, using the PHP DOM functions. I have stripped out everything apart from paragraph, span and line break tags, and now I want to retrieve all the text, along with its accompanying styles.

So, I'd like to get each piece of text, one by one, and for each one I can then go back up the tree to get the values of particular attributes (I'm only interested in some specific ones, like color etc.).

How can I do this? Or am I thinking about it the wrong way?

806

asked Jan 24 '11 12:01

Sharon

1 Answers

Suppose you have a DOMDocument here:

$doc = new DOMDocument();
$doc->loadHTMLFile('http://stackoverflow.com/');

You can find all text nodes using a simple Xpath.

$xpath = new DOMXpath($doc);
$textNodes = $xpath->query('//text()');

Just foreach over it to iterate over all textnodes:

foreach ($textNodes as $textNode) {
    echo $textNode->data . "\n";
}

From that, you can go up the DOM tree by using ->parentNode.

Hope that this can give you a good start.

answered Nov 03 '22 01:11

Thai

Related questions
                            
                                how to kill background php thread?
                            
                                Querying an audio/video file for information
                            
                                Rename "web" folder in Symfony 1.4
                            
                                replace the same characters with different strings
                            
                                How to connect Jms from PHP ?
                            
                                Split php content in multiple files
                            
                                How do I set the default 'save-as' name for an image generated in PHP?
                            
                                PHP: date between date
                            
                                CodeIgniter - unlimited parameters?
                            
                                preg_replace successful or not
                            
                                PHP Site Scraping With a Secure Login
                            
                                htaccess add trailing slash and force www with clean urls
                            
                                Determining the user's time and date in PHP
                            
                                Rewrite if folder doesn't exist?
                            
                                array: store multiple values per key
                            
                                MySQL Insert Into datetime = NOW() is not working? [closed]
                            
                                need capitalize words with special chars in PHP
                            
                                imagecopyresampled to resize and crop an image - not returning the expected result
                            
                                I got error while run cron job using php, How to i fixed it?
                            
                                What is the preferred way to write my linux daemons?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With