Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is preferable: sha1_file(f) or sha1(file_get_contents(f))?

Tags:

php

hash

sha

I want to create a hash of a file which size minimum 5Mb and can extend to 1-2 Gb. Now tough choice arise in between these two methods although they work exactly same.

Method 1: sha1_file($file)
Method 2: sha1(file_get_contents($file))

I have tried with 10 Mb but there is no much difference in performance. But on higher data scale. What's better way to go?

like image 823
Rahul Avatar asked Mar 22 '23 15:03

Rahul


1 Answers

Use the most high-level form offered unless there is a compelling reason otherwise.

In this case, the correct choice is sha1_file. Because sha1_file is a higher-level function that only works with files. This 'restriction' allows it to take advantage of the fact that the file/source can be processed as a stream1: only a small part of the file is ever read into memory at a time.

The second approach guarantees that 5MB-2GB of memory (the size of the file) is wasted/used as file_get_contents reads everything into memory before the hash is generated. As the size of the files increase and/or system resources become limited this can have a very detrimental effect on performance.


1 The source for sha1_file can be found on github. Here is an extract showing only lines relevant to stream processing:

PHP_FUNCTION(sha1_file)
{       
    stream = php_stream_open_wrapper(arg, "rb", REPORT_ERRORS, NULL);
    PHP_SHA1Init(&context);    
    while ((n = php_stream_read(stream, buf, sizeof(buf))) > 0) {
        PHP_SHA1Update(&context, buf, n);
    }    
    PHP_SHA1Final(digest, &context);    
    php_stream_close(stream);
}

By using higher-level functions, the onus of a suitable implementation is placed on the developers of the library. In this case it allowed the use of a scaling stream implementation.

like image 169
user2864740 Avatar answered Apr 07 '23 02:04

user2864740