Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read image IPTC data

Tags:

image

exif

iptc

I'm having some trouble with reading out the IPTC data of some images, the reason why I want to do this, is because my client has all the keywords already in the IPTC data and doesn't want to re-enter them on the site.

So I created this simple script to read them out:

$size = getimagesize($image, $info);

if(isset($info['APP13'])) {
    $iptc = iptcparse($info['APP13']);

    print '<pre>';
        var_dump($iptc['2#025']);
    print '</pre>';
}

This works perfectly in most cases, but it's having trouble with some images.

Notice: Undefined index: 2#025

While I can clearly see the keywords in photoshop.

Are there any decent small libraries that could read the keywords in every image? Or am I doing something wrong here?

like image 440
woutr_be Avatar asked Jan 08 '12 14:01

woutr_be


2 Answers

I've seen a lot of weird IPTC problems. Could be that you have 2 APP13 segments. I noticed that, for some reasons, some JPEGs have multiple IPTC blocks. It's possibly the problem with using several photo-editing programs or some manual file manipulation.

Could be that PHP is trying to read the empty APP13 or even embedded "thumbnail metadata".

Could be also problem with segments lenght - APP13 or 8BIM have lenght marker bytes that might have wrong values.

Try HEX editor and check the file "manually".

like image 70
yosh Avatar answered Sep 29 '22 13:09

yosh


I have found that IPTC is almost always embedded as xml using the XMP format, and is often not in the APP13 slot. You can sometimes get the IPTC info by using iptcparse($info['APP1']), but the most reliable way to get it without a third party library is to simply search through the image file from the relevant xml string (I got this from another answer, but I haven't been able to find it, otherwise I would link!):

The xml for the keywords always has the form "<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>"

So you can just get the file as a string using file_get_contents(get_attached_file($attachment_id)), use strpos() to find each opening (<rdf:li>) and closing (</rdf:li>) XML tag, and grab the keyword between them using substr().

The following snippet works for all jpegs I have tested it on. It will fill the array $keys with IPTC tags taken from an image on wordpress with id $attachment_id:

$content = file_get_contents(get_attached_file($attachment_id));

// Look for xmp data: xml tag "dc:subject" is where keywords are stored
$xmp_data_start = strpos($content, '<dc:subject>') + 12;

// Only proceed if able to find dc:subject tag
if ($xmp_data_start != FALSE) {
    $xmp_data_end   = strpos($content, '</dc:subject>');
    $xmp_data_length     = $xmp_data_end - $xmp_data_start;
    $xmp_data       = substr($content, $xmp_data_start, $xmp_data_length);

    // Look for tag "rdf:Seq" where individual keywords are listed
    $key_data_start = strpos($xmp_data, '<rdf:Seq>') + 9;

    // Only proceed if able to find rdf:Seq tag
    if ($key_data_start != FALSE) {
        $key_data_end   = strpos($xmp_data, '</rdf:Seq>');
        $key_data_length     = $key_data_end - $key_data_start;
        $key_data       = substr($xmp_data, $key_data_start, $key_data_length);

        // $ctr will track position of each <rdf:li> tag, starting with first
        $ctr = strpos($key_data, '<rdf:li>');

        // Initialize empty array to store keywords
        $keys = Array();

        // While loop stores each keyword and searches for next xml keyword tag
        while($ctr != FALSE && $ctr < $key_data_length) {
            // Skip past the tag to get the keyword itself
            $key_begin = $ctr + 8;

            // Keyword ends where closing tag begins
            $key_end = strpos($key_data, '</rdf:li>', $key_begin);

            // Make sure keyword has a closing tag
            if ($key_end == FALSE) break;

            // Make sure keyword is not too long (not sure what WP can handle)
            $key_length = $key_end - $key_begin;
            $key_length = (100 < $key_length ? 100 : $key_length);

            // Add keyword to keyword array
            array_push($keys, substr($key_data, $key_begin, $key_length));

            // Find next keyword open tag
            $ctr = strpos($key_data, '<rdf:li>', $key_end);
        }
    }
} 

I have this implemented in a plugin to put IPTC keywords into WP's "Description" field, which you can find here.

like image 38
John Hungerford Avatar answered Sep 29 '22 11:09

John Hungerford