Parse HTML with PHP's HTML DOMDocument

Tags:

I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)

I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)

So I want to capture "Capture this text 1" and "Capture this text 2" and so on.

Doesn't look to hard, but I can't figure it out :(

<div class="main">     <div class="text">     Capture this text 1     </div> </div>  <div class="main">     <div class="text">     Capture this text 2     </div> </div>

439

asked Apr 03 '10 12:04

Mint

2 Answers

If you want to get :

The text
that's inside a <div> tag with class="text"
that's, itself, inside a <div> with class="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.

For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :

$html = <<<HTML <div class="main">     <div class="text">     Capture this text 1     </div> </div>  <div class="main">     <div class="text">     Capture this text 2     </div> </div> HTML;  $dom = new DOMDocument(); $dom->loadHTML($html);  $xpath = new DOMXPath($dom);

And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]'); foreach ($tags as $tag) {     var_dump(trim($tag->nodeValue)); }

And executing this gives me the following output :

string 'Capture this text 1' (length=19) string 'Capture this text 2' (length=19)

answered Oct 13 '22 10:10

Pascal MARTIN

You can use http://simplehtmldom.sourceforge.net/

It is very simple easy to use DOM parser written in php, by which you can easily fetch the content of div tag.

Something like this:

// Find all <div> which have attribute id=text $ret = $html->find('div[id=text]');

See the documentation of it for more help.

answered Oct 13 '22 08:10

lokeshsk

Related questions
                            
                                C# Conditional Operator Not a Statement?
                            
                                How can I make the output from tapply() into a data.frame
                            
                                Is there a super simple List / ListAdapter example out there for android
                            
                                Logging in a C# library
                            
                                Objective-C NSMutableArray - foreach loop with objects of multiple classes
                            
                                Set data structure of Java in javascript/jQuery
                            
                                DateTime Convert from int to Month Name in C#, Silverlight
                            
                                How to resolve CVT1100 in Visual Studio 2010 Ultimate?
                            
                                Convert GeoPoint to Location
                            
                                Cross platform sound API for games? [closed]
                            
                                Linked Lists in C without malloc
                            
                                Are single/double quotes allowed inside HTML attribute values?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With