Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading very large files in PHP

People also ask

How to read large file in PHP?

Use fgets() Function to Read a Large File Line by Line in PHP. The built-in function fgets() reads a line from an open file. We can use this function to read a large file line by line.

How to handle large file in PHP?

// from reading-files-line-by-line-1. php function readTheFile($path) { $lines = []; $handle = fopen($path, "r"); while(! feof($handle)) { $lines[] = trim(fgets($handle)); } fclose($handle); return $lines; } readTheFile("shakespeare. txt"); require "memory.


Are you sure that it's fopen that's failing and not your script's timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.

Another thing to consider may be the memory limit on your script - reading the file into an array may trip over this, so check your error log for memory warnings.

If neither of the above are your problem, you might look into using fgets to read the file in line-by-line, processing as you go.

$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);
        // Process buffer here..
    }
    fclose($handle);
}

Edit

PHP doesn't seem to throw an error, it just returns false.

Is the path to $rawfile correct relative to where the script is running? Perhaps try setting an absolute path here for the filename.


Did 2 tests with a 1.3GB file and a 9.5GB File.

1.3 GB

Using fopen()

This process used 15555 ms for its computations.

It spent 169 ms in system calls.

Using file()

This process used 6983 ms for its computations.

It spent 4469 ms in system calls.

9.5 GB

Using fopen()

This process used 113559 ms for its computations.

It spent 2532 ms in system calls.

Using file()

This process used 8221 ms for its computations.

It spent 7998 ms in system calls.

Seems file() is faster.


• The fgets() function is fine until the text files passed 20 MBytes and the parsing speed is greatly reduced.

• The file_ get_contents() function give good results until 40 MBytes and acceptable results until 100 MBytes, but file_get_contents() loads the entire file into memory, so it's not scalabile.

• The file() function is disastrous with large files of text because this function creates an array containing each line of text, thus this array is stored in memory and the memory used is even larger.
Actually, a 200 MB file I could only manage to parse with memory_limit set at 2 GB which was inappropriate for the 1+ GB files I intended to parse.

When you have to parse files larger than 1 GB and the parsing time exceeded 15 seconds and you want to avoid to load the entire file into memory, you have to find another way.

My solution was to parse data in arbitrary small chunks. The code is:

$filesize = get_file_size($file);
$fp = @fopen($file, "r");
$chunk_size = (1<<24); // 16MB arbitrary
$position = 0;

// if handle $fp to file was created, go ahead
if ($fp) {
   while(!feof($fp)){
      // move pointer to $position in file
      fseek($fp, $position);

      // take a slice of $chunk_size bytes
      $chunk = fread($fp,$chunk_size);

      // searching the end of last full text line (or get remaining chunk)
      if ( !($last_lf_pos = strrpos($chunk, "\n")) ) $last_lf_pos = mb_strlen($chunk);

      // $buffer will contain full lines of text
      // starting from $position to $last_lf_pos
      $buffer = mb_substr($chunk,0,$last_lf_pos);
      
      ////////////////////////////////////////////////////
      //// ... DO SOMETHING WITH THIS BUFFER HERE ... ////
      ////////////////////////////////////////////////////

      // Move $position
      $position += $last_lf_pos;

      // if remaining is less than $chunk_size, make $chunk_size equal remaining
      if(($position+$chunk_size) > $filesize) $chunk_size = $filesize-$position;
      $buffer = NULL;
   }
   fclose($fp);
}

The memory used is only the $chunk_size and the speed is slightly less than the one obtained with file_ get_contents(). I think PHP Group should use my approach in order to optimize it's parsing functions.

*) Find the get_file_size() function here.


Well you could try to use the readfile function if you just want to output the file.

If this is not the case - maybe you should think about the design of the application, why do you want to open such large files on web requests?


I used fopen to open video files for streaming, using a php script as a video streaming server, and I had no problem with files of size more than 50/60 MB.