Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate mime type ( is PDF for instance ) of both a file and a variable string?

Tags:

php

pdf

I have a bunch of PDFs that were downloaded using a scraper. This scraper didn't check to see if the file was a JPG or a PDF so by default all of them were downloaded and saved with the '.pdf' extension. So, just to clarify all the files in the batch are .pdf. However, if I try to open them(The files that are not PDF but rather JPGs) via a server or locally I'm hit with an error.

My question. Is there a way with PHP to check and see if this file is a valid PDF? I would like to run all the URLs through a loop to check these files. There are hundreds of them and it would take hours upon hours to check.

Thanks

like image 993
smack-a-bro Avatar asked Jul 20 '15 12:07

smack-a-bro


2 Answers

For local files (PHP 5.3+):

$finfo = finfo_open(FILEINFO_MIME_TYPE);
foreach (glob("path/to/files") as $filename) {
    if(finfo_file($finfo, $filename) === 'application/pdf') {
        echo "'{$filename}' is a PDF" . PHP_EOL;
    } else {
        echo "'{$filename}' is not a PDF" . PHP_EOL;
    }
}
finfo_close($finfo);

For remote files:

$ch = curl_init();
$url = 'http://path.to/your.pdf';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$results = split("\n", trim(curl_exec($ch)));
foreach($results as $line) {
    if (strtok($line, ':') == 'Content-Type') {
            $parts = explode(":", $line);
            echo trim($parts[1]); // output: application/pdf
    }
}
like image 151
vonUbisch Avatar answered Sep 28 '22 20:09

vonUbisch


Get MIME type of the file using function: finfo_file()

if (function_exists('finfo_open')) {
    $finfo = finfo_open(FILEINFO_MIME);
    $mimetype = finfo_file($finfo, "PATH-TO-YOUR-FILE");
    finfo_close($finfo);
    echo $mimetype;
}

echo "<pre>";
print_r($mimetype);
echo "</pre>";
like image 28
Pupil Avatar answered Sep 28 '22 19:09

Pupil