Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all images url from string [duplicate]

Possible Duplicate:
How to extract img src, title and alt from html using php?

Hi,
I have found solution to get first image from string:

preg_match('~<img[^>]*src\s?=\s?[\'"]([^\'"]*)~i',$string, $matches);

But I can't manage to get all images from string.
One more thing... If image contains alternative text (alt attribute) how to get it too and save to another variable?
Thanks in advance,
Ilija

like image 867
ilija veselica Avatar asked Oct 03 '09 10:10

ilija veselica


3 Answers

Don't do this with regular expressions. Instead, parse the HTML. Take a look at Parse HTML With PHP And DOM. This is a standard feature in PHP 5.2.x (and probably earlier). Basically the logic for getting images is roughly:

$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
  echo $image->getAttribute('src');
}

This should be trivial to adapt to finding images.

like image 92
cletus Avatar answered Nov 09 '22 15:11

cletus


This is what I tried but can't get it print value of src

 $dom = new domDocument;

    /*** load the html into the object ***/
    $dom->loadHTML($html);

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the table by its tag name ***/
    $images = $dom->getElementsByTagName('img');

    /*** loop over the table rows ***/
    foreach ($images as $img)
    {
        /*** get each column by tag name ***/
        $url = $img->getElementsByTagName('src');
        /*** echo the values ***/
        echo $url->nodeValue;
        echo '<hr />';
    }

EDIT: I solved this problem

$dom = new domDocument;

/*** load the html into the object ***/
$dom->loadHTML($string);

/*** discard white space ***/
$dom->preserveWhiteSpace = false;

$images = $dom->getElementsByTagName('img');

foreach($images as $img)
    {
        $url = $img->getAttribute('src');   
        $alt = $img->getAttribute('alt');   
        echo "Title: $alt<br>$url<br>";
    }
like image 43
ilija veselica Avatar answered Nov 09 '22 16:11

ilija veselica


Note that Regular Expressions are a bad approach to parsing anything that involves matching braces.

You'd be better off using the DOMDocument class.

like image 21
John Carter Avatar answered Nov 09 '22 16:11

John Carter