Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: fseek() for large file (>2GB)

Tags:

php

I have a very large file (about 20GB), how can I use fseek() to jump around and read its content.

The code looks like this:

function read_bytes($f, $offset, $length) {
    fseek($f, $offset);
    return fread($f, $length);
}

The result is only correct if $offset < 2147483647.

Update: I am running on windows 64, phpinfo - Architecture: x64, PHP_INT_MAX: 2147483647

like image 954
anvoz Avatar asked Jun 14 '13 18:06

anvoz


People also ask

How to read large file in PHP?

Read a file: We will read the file by using fopen() function. This function is used to read and open a file. Syntax: fopen("filename", access_mode);

What is Fseek function in PHP?

The fseek() function seeks in an open file. This function moves the file pointer from its current position to a new position, forward or backward, specified by the number of bytes.

What is the parameters Sequesnce of Fseek function?

Syntax: int fseek ( $file, $offset, $whence) Parameters: The fseek() function in PHP accepts three parameters as described below. $file: It is a mandatory parameter which specifies the file. $offset: It is a mandatory parameter which specifies the new position of the pointer.


2 Answers

WARNING: as noted in comments, fseek uses INT internally and it simply cant work with such large files on 32bit PHP compilations. Following solution wont work. It is left here just for reference.

a little bit of searching led me to comments on PHP manual page for fseek:

http://php.net/manual/en/function.fseek.php

problem is maximum int size for offset parameter but seems that you can work around it by doing multiple fseek calls with SEEK_CUR option and mix it with one of big numbers processing library.

example:

function fseek64(&$fh, $offset)
{
    fseek($fh, 0, SEEK_SET);
    $t_offset   = '' . PHP_INT_MAX;
    while (gmp_cmp($offset, $t_offset) == 1)
    {
        $offset     = gmp_sub($offset, $t_offset);
        fseek($fh, gmp_intval($t_offset), SEEK_CUR);
    }
    return fseek($fh, gmp_intval($offset), SEEK_CUR);
}

fseek64($f, '23456781232');
like image 55
fsw Avatar answered Oct 18 '22 02:10

fsw


for my project, i needed to READ blocks of 10KB from a BIG offset in a BIG file (>3 GB). Writes were always append, so no offsets needed.

this will work, irrespective of which PHP version and OS you are using.

Pre-requisite = your server should support Range-retrieval queries. Apache & IIS already support this, as do 99% of other webservers (shared hosting or otherwise)

// offset, 3GB+
$start=floatval(3355902253);

// bytes to read, 100 KB
$len=floatval(100*1024);

// set up the http byte range headers
$opts = array('http'=>array('method'=>'GET','header'=>"Range: bytes=$start-".($start+$len-1)));
$context = stream_context_create($opts);
// bytes ranges header
print_r($opts);

// change the URL below to the URL of your file. DO NOT change it to a file path.
// you MUST use a http:// URL for your file for a http request to work
// this will output the results
echo $result = file_get_contents('http://127.0.0.1/dir/mydbfile.dat', false, $context);

// status of your request
// if this is empty, means http request didnt fire. 
print_r($http_response_header);

// Check your file URL and verify by going directly to your file URL from a web 
// browser. If http response shows errors i.e. code > 400 check you are sending the
// correct Range headers bytes. For eg - if you give a start Range which exceeds the
// current file size, it will give 406. 

// NOTE  - The current file size is also returned back in the http response header
// Content-Range: bytes 355902253-355903252/355904253, the last number is the file size

...

...

...

SECURITY - you must add a .htaccess rule which denies all requests for this database file except those coming from local ip 127.0.0.1.

like image 41
Tech Consultant Avatar answered Oct 18 '22 03:10

Tech Consultant