I have just found out that my script gives me a fatal error: <pre class="prettyprint"><code>Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 440 bytes) in C:\process_txt.php on line 109 </code></pre> That line is this: <pre class="prettyprint"><code>$lines = count(file($path)) - 1; </code></pre> So I think it is having difficulty loading the file into memeory and counting the number of lines, is there a more efficient way I can do this without having memory issues? The text files that I need to count the number of lines for range from 2MB to 500MB. Maybe a Gig sometimes. Thanks all for any help.

Using a loop of <code>fgets()</code> calls is fine solution and the most straightforward to write, however: <ol> <li> even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line. </li> <li> it's technically possible that a single line may be bigger than the available memory if you're reading a binary file. </li> </ol> This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk. <pre class="prettyprint"><code>function getLines($file) { $f = fopen($file, 'rb'); $lines = 0; while (!feof($f)) { $lines += substr_count(fread($f, 8192), "\n"); } fclose($f); return $lines; } </code></pre> If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files. <h3>Benchmark</h3> I ran a test with a 1GB file; here are the results: <pre class="prettyprint lang-none prettyprint-override"><code> +-------------+------------------+---------+ | This answer | Dominic's answer | wc -l | +------------+-------------+------------------+---------+ | Lines | 3550388 | 3550389 | 3550388 | +------------+-------------+------------------+---------+ | Runtime | 1.055 | 4.297 | 0.587 | +------------+-------------+------------------+---------+ </code></pre> Time is measured in seconds real time, see here what real means <h3>True line count</h3> While the above works well and returns the same results as <code>wc -l</code>, if the file ends without a newline, the line number will be off by one; if you care about this particular scenario, you can make it more accurate by using this logic: <pre class="prettyprint lang-php prettyprint-override"><code> function getLines($file) { $f = fopen($file, 'rb'); $lines = 0; $buffer = ''; while (!feof($f)) { $buffer = fread($f, 8192); $lines += substr_count($buffer, "\n"); } fclose($f); if (strlen($buffer) > 0 && $buffer[-1] != "\n") { ++$lines; } return $lines; } </code></pre>

Efficiently counting the number of lines of a text file. (200mb+)

Tags:

file

text

php

memory

memory-leaks

I have just found out that my script gives me a fatal error:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 440 bytes) in C:\process_txt.php on line 109

That line is this:

$lines = count(file($path)) - 1;

So I think it is having difficulty loading the file into memeory and counting the number of lines, is there a more efficient way I can do this without having memory issues?

The text files that I need to count the number of lines for range from 2MB to 500MB. Maybe a Gig sometimes.

Thanks all for any help.

936

asked Jan 29 '10 14:01

Abs

2 Answers

This will use less memory, since it doesn't load the whole file into memory:

$file="largefile.txt"; $linecount = 0; $handle = fopen($file, "r"); while(!feof($handle)){   $line = fgets($handle);   $linecount++; }  fclose($handle);  echo $linecount;

fgets loads a single line into memory (if the second argument $length is omitted it will keep reading from the stream until it reaches the end of the line, which is what we want). This is still unlikely to be as quick as using something other than PHP, if you care about wall time as well as memory usage.

The only danger with this is if any lines are particularly long (what if you encounter a 2GB file without line breaks?). In which case you're better off doing slurping it in in chunks, and counting end-of-line characters:

$file="largefile.txt"; $linecount = 0; $handle = fopen($file, "r"); while(!feof($handle)){   $line = fgets($handle, 4096);   $linecount = $linecount + substr_count($line, PHP_EOL); }  fclose($handle);  echo $linecount;

answered Nov 07 '22 14:11

Dominic Rodger

Using a loop of fgets() calls is fine solution and the most straightforward to write, however:

even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line.
it's technically possible that a single line may be bigger than the available memory if you're reading a binary file.

This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk.

function getLines($file) {     $f = fopen($file, 'rb');     $lines = 0;      while (!feof($f)) {         $lines += substr_count(fread($f, 8192), "\n");     }      fclose($f);      return $lines; }

If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files.

Benchmark

I ran a test with a 1GB file; here are the results:

             +-------------+------------------+---------+              | This answer | Dominic's answer | wc -l   | +------------+-------------+------------------+---------+ | Lines      | 3550388     | 3550389          | 3550388 | +------------+-------------+------------------+---------+ | Runtime    | 1.055       | 4.297            | 0.587   | +------------+-------------+------------------+---------+

Time is measured in seconds real time, see here what real means

True line count

While the above works well and returns the same results as wc -l, if the file ends without a newline, the line number will be off by one; if you care about this particular scenario, you can make it more accurate by using this logic:

 function getLines($file) {     $f = fopen($file, 'rb');     $lines = 0; $buffer = '';      while (!feof($f)) {         $buffer = fread($f, 8192);         $lines += substr_count($buffer, "\n");     }      fclose($f);      if (strlen($buffer) > 0 && $buffer[-1] != "\n") {         ++$lines;     }     return $lines; }

answered Nov 07 '22 14:11

Ja͢ck

Related questions
                            
                                Date minus 1 year?
                            
                                PHP Thread Safe and Non-Thread Safe for Windows
                            
                                How can I catch a "catchable fatal error" on PHP type hinting?
                            
                                How do I make an asynchronous GET request in PHP?
                            
                                How to integrate nodeJS + Socket.IO and PHP?
                            
                                open_basedir restriction in effect. File(/) is not within the allowed path(s):
                            
                                Which $_SERVER variables are safe?
                            
                                What are cookies and sessions, and how do they relate to each other?
                            
                                Symfony 2 EntityManager injection in service
                            
                                How to access route, post, get etc. parameters in Zend Framework 2
                            
                                php: determine where function was called from
                            
                                PHP file_get_contents() returns "failed to open stream: HTTP request failed!"
                            
                                How to set upload_max_filesize in .htaccess?
                            
                                Calling PHP functions within HEREDOC strings
                            
                                Adding three months to a date in PHP
                            
                                Laravel - Return json along with http status code
                            
                                How do you set up use HttpOnly cookies in PHP
                            
                                Eloquent ORM laravel 5 Get Array of ids
                            
                                POST data to a URL in PHP
                            
                                adding 1 day to a DATETIME format value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With