Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get image src attribute value from php string?

Tags:

php

I tried to follow some questions here about preg_match and DOM, but everything just flew over my head.

I have a string like this:

$string = '<td class="borderClass" width="225" style="border-width: 0 1px 0 0;" valign="top">
<div style="text-align: center;">
    <a href="http://myanimelist.net/anime/10800/Chihayafuru/pic&pid=35749">
    <img src="http://cdn.myanimelist.net/images/anime/3/35749.jpg" alt="Chihayafuru" align="center">
    </a>
</div>';

I'm now trying to get the image src attribute value from it. I tried using this code, but I can't figure out what I'm doing wrong.

$doc = new DOMDocument();
$dom->loadXML( $string );
$imgs = $dom->query("//img");
for ($i=0; $i < $imgs->length; $i++) {
    $img = $imgs->item($i);
    $src = $img->getAttribute("src");
}
$scraped_img = $src;

How may I get the image src attribute from this using php?

like image 958
Imtiaz Avatar asked Feb 16 '23 00:02

Imtiaz


2 Answers

Here is the corrected code, that you can use:

$string = '<td class="borderClass" width="225" style="border-width: 0 1px 0 0;" valign="top">
<div style="text-align: center;">
    <a href="http://myanimelist.net/anime/10800/Chihayafuru/pic&pid=35749">
    <img src="http://cdn.myanimelist.net/images/anime/3/35749.jpg" alt="Chihayafuru" align="center">
    </a>
</div>';

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML( $string );
$xpath = new DOMXPath($doc);
$imgs = $xpath->query("//img");
for ($i=0; $i < $imgs->length; $i++) {
    $img = $imgs->item($i);
    $src = $img->getAttribute("src");
}

echo $src;

OUTPUT

http://cdn.myanimelist.net/images/anime/3/35749.jpg
like image 128
anubhava Avatar answered Feb 17 '23 15:02

anubhava


We have found while writing Drupal that using SimpleXML is much easier than dealing with the DOM:

$htmlDom = new \DOMDocument();
@$htmlDom->loadHTML('<?xml encoding="UTF-8">' . $string);
$elements = simplexml_import_dom($htmlDom);
print $elements->body->td[0]->div[0]->a[0]->img[0]['src'];

This allows you load whatever HTML soup because the DOM is more forgiving than simplexml and at the same time allows using the simple and powerful simplexml extension.

The first three lines are copied verbatin out of the Drupal testing framework -- it's truly battle hardened code.

like image 28
chx Avatar answered Feb 17 '23 15:02

chx