Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Opensubtitles hash function fails for large files

Tags:

php

hash

I am using the function below to calculate the opensubtitles.org hash for movie files. It works mostly but with large files I get the following error.

I dont really understand it because there should always be data available.

Can anyonep point me in the right direction?

PHP Warning: unpack(): Type v: not enough input, need 2, have 0 in file.php on line 169

function OpenSubtitlesHash($file)
{
    $handle = fopen($file, "rb");
    $fsize = filesize($file);

    $hash = array(3 => 0, 
                  2 => 0, 
                  1 => ($fsize >> 16) & 0xFFFF, 
                  0 => $fsize & 0xFFFF);

    for ($i = 0; $i < 8192; $i++)
    {
        $tmp = ReadUINT64($handle);
        $hash = AddUINT64($hash, $tmp);
    }

    $offset = $fsize - 65536;
    fseek($handle, $offset > 0 ? $offset : 0, SEEK_SET);

    for ($i = 0; $i < 8192; $i++)
    {
        $tmp = ReadUINT64($handle);
        $hash = AddUINT64($hash, $tmp);         
    }

    fclose($handle);
        return UINT64FormatHex($hash);
}

function ReadUINT64($handle)
{
    $u = unpack("va/vb/vc/vd", fread($handle, 8));
    return array(0 => $u["a"], 1 => $u["b"], 2 => $u["c"], 3 => $u["d"]);
}

function AddUINT64($a, $b)
{
    $o = array(0 => 0, 1 => 0, 2 => 0, 3 => 0);

    $carry = 0;
    for ($i = 0; $i < 4; $i++) 
    {
        if (($a[$i] + $b[$i] + $carry) > 0xffff ) 
        {
            $o[$i] += ($a[$i] + $b[$i] + $carry) & 0xffff;
            $carry = 1;
        }
        else 
        {
            $o[$i] += ($a[$i] + $b[$i] + $carry);
            $carry = 0;
        }
    }

    return $o;   
}

function UINT64FormatHex($n)
{   
    return sprintf("%04x%04x%04x%04x", $n[3], $n[2], $n[1], $n[0]);
}
like image 716
The Surrican Avatar asked Dec 25 '17 12:12

The Surrican


2 Answers

If you supplied some additional info: version of the system, version of php, size of large files, type of files (simple files, urls, etc) - it would give more info for an accurate answer.

The main assumption that you are on 32-bits system and having troubles with filsize with files more than 2GB. From docs:

Note: Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.

You probably get the wrong filesize value and therefore can't accurately read trailing bytes. This comment explains how to get size of larger files and also notes that fseek uses int internally so you can't put a pointer after 2GB threshold. You will need to fread to this position.

There are other hypothesis could be checked:

  • fread could read more data than requested under certain circustances:

    if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size.

  • stat cache prevents you from getting an accurate file size value;
like image 103
origaminal Avatar answered Nov 02 '22 02:11

origaminal


You never check if your $handle has any resoure, when your $handle is null or false you will get the same error

PHP Warning: unpack(): Type v: not enough input, need 2, have 0 in file.php on line 169

So add a check before you do something with the $handle

if(!is_null($handle)){
  // Do something..
}
like image 20
Jawido Kakarot Avatar answered Nov 02 '22 00:11

Jawido Kakarot