Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Simple HTML DOM Parser: Select only DIVs with multiple classes

I was searching like mad and found no solution. The problem is simple.

Let's say I have 3 DIVs:

<div class="class1">
  <div class="subclass"> TEXT1 </div>
</div>

<div class="class2">
  <div class="subclass"> TEXT2 </div>
</div>

<div class="class1 class2">
  <div class="subclass"> TEXT3 </div>
</div>

So, very simple. I just want to find the TEXT3, which has BOTH class1 and class2. Using Simple HTML DOM Parser, I can't seem to get it to work.

Here's what I tried:

foreach($html->find("[class=class1], [class=class2]") as $item) {
$items[] =  $item->find('.subclass', 0)->plaintext;
}

The problem is, with

find("[class=class1], [class=class2]")

it's finding all of them, as the comma is like an OR, if I leave the comma, it's looking for nested class2 inside class1. I am just looking for an AND...

EDIT

Thanks to 19greg96 I found out that

div[class=class1 class2]

works, the problem is that it looks for exactly those two in that order. Let's say I have

<div class="class1 class2">
  <div class="subclass"> TEXT3 </div>
</div>

then it works, and if I have

<div class="class1 class2 class3">
  <div class="subclass"> TEXT3 </div>
</div>

it works when I put an asterix, as it looks for the substring:

div[class*=class1 class2]

PROBLEM

I know only that class1 and class3 is there, but maybe others and in random order. That still doesn't work. Any idea how to just look for A & B in any random order? So that

div[class=class1 class3]

works with that example?

like image 818
Chris Avatar asked Jan 10 '13 18:01

Chris


2 Answers

EDIT2: As this is a bug in the dom parser (tested on version 1.5), there is no simple way of doing this. Solution I could think of:

$find = $html->find(".class1");
$ret = array();
foreach ($find as $element) {
    if (strpos($element->class, 'class3') !== false) {
        $ret[] = $element;
    }
}
$find = $ret;

basically you find all the elements with class one than iterate through those elements to find the ones that have class two (in this case three).


Previous answer:

Simple answer (should work according to html spec):

find(".class1.class2")

this will look for any type of element (div,img,a etc..) that has both class1 and class2. If you want to specify the type of element to match add it to the beginning without a . like:

find("div.class1.class2")

If you have a space between the two specified classes it will match elements with both the classes or elements nested in the element with the first class:

find(".class1 .class2")

will match

<div class="class1">
  <div class="class2">this will be returned</div>
</div>

or

<div class="class1 class2">this will be returned</div>

edit: I tried your code and found that the solutions above do not work. The solution that does work however is as follows:

$html->find("div[class=class1 class2]")
like image 126
19greg96 Avatar answered Oct 02 '22 05:10

19greg96


You can also try this :

test.html

<h1 class="first second last">
    <p>Paragraph</p>
</h1>

Solution :

include "simple_html_dom.php";

$html = file_get_html('test.html');
$h1 = $html->find('h1');
foreach ($h1 as $h1) {
    $h1Class = ($h1->class);
    if($h1Class == 'first second last'){
        $item['test'] = 'success';
    }else{
        $item['test'] = 'fail';
    }
    $ar[] = $item;
}
echo "<pre>";
print_r($ar);
like image 26
Paramjeet Avatar answered Oct 02 '22 07:10

Paramjeet