Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A way to make md5_file() faster?

Tags:

php

md5

md5-file

I currently use md5_file() to run through about 15 URLs and verify their MD5 hashes. Is there a way that I can make this faster? It takes far too long to run through all of them.

like image 853
Rob Avatar asked May 01 '10 14:05

Rob


1 Answers

Probably you're doing it sequentially right now. I.e. fetch data 1, process data1, fetch data 2, process data 2, ... and the bottleneck might be the data transfer.
You could use curl_multi_exec() to parallelize that a bit. Either register a CURLOPT_WRITEFUNCTION and process each chunk of data (tricky since md5() works on exactly one chunk of data).
Or check for curl handles that are already finished and then process the data of that handle.

edit: quick&dirty example using the hash extension (which provides functions for incremental hashes) and a php5.3+ closure:

$urls = array(
  'http://stackoverflow.com/',
  'http://sstatic.net/so/img/logo.png',
  'http://www.gravatar.com/avatar/212151980ba7123c314251b185608b1d?s=128&d=identicon&r=PG',
  'http://de.php.net/images/php.gif'
);

$data = array();
$fnWrite = function($ch, $chunk) use(&$data) {
  foreach( $data as $d ) {
    if ( $ch===$d['curlrc'] ) {
      hash_update($d['hashrc'], $chunk);
    }
  }
};

$mh = curl_multi_init();
foreach($urls as $u) {
  $current = curl_init();
  curl_setopt($current, CURLOPT_URL, $u);
  curl_setopt($current, CURLOPT_RETURNTRANSFER, 0);
  curl_setopt($current, CURLOPT_HEADER, 0);
  curl_setopt($current, CURLOPT_WRITEFUNCTION, $fnWrite);
  curl_multi_add_handle($mh, $current);
  $hash = hash_init('md5');
  $data[] = array('url'=>$u, 'curlrc'=>$current, 'hashrc'=>$hash); 
}

$active = null;
//execute the handles
do {
  $mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);

while ($active && $mrc == CURLM_OK) {
  if (curl_multi_select($mh) != -1) {
    do {
      $mrc = curl_multi_exec($mh, $active);
    } while ($mrc == CURLM_CALL_MULTI_PERFORM);
  }
}

foreach($data as $d) {
  curl_multi_remove_handle($mh, $d['curlrc']);
  echo $d['url'], ': ', hash_final($d['hashrc'], false), "\n";
}
curl_multi_close($mh);

(haven't checked the results though ...it's only a starting point)

like image 64
VolkerK Avatar answered Nov 06 '22 07:11

VolkerK