In my PHP application I need to read multiple lines starting from the end of many files (mostly logs). Sometimes I need only the last one, sometimes I need tens or hundreds. Basically, I want something as flexible as the Unix <code>tail</code> command. There are questions here about how to get the single last line from a file (but I need N lines), and different solutions were given. I'm not sure about which one is the best and which performs better.

<h3>Methods overview</h3> Searching on the internet, I came across different solutions. I can group them in three approaches: <ul> <li> naive ones that use <code>file()</code> PHP function;</li> <li> cheating ones that runs <code>tail</code> command on the system;</li> <li> mighty ones that happily jump around an opened file using <code>fseek()</code>.</li> </ul> I ended up choosing (or writing) five solutions, a naive one, a cheating one and three mighty ones. <ol> <li>The most concise naive solution, using built-in array functions.</li> <li>The only possible solution based on <code>tail</code> command, which has a little big problem: it does not run if <code>tail</code> is not available, i.e. on non-Unix (Windows) or on restricted environments that don't allow system functions.</li> <li>The solution in which single bytes are read from the end of file searching for (and counting) new-line characters, found here.</li> <li>The multi-byte buffered solution optimized for large files, found here.</li> <li>A slightly modified version of solution #4 in which buffer length is dynamic, decided according to the number of lines to retrieve.</li> </ol> All solutions work. In the sense that they return the expected result from any file and for any number of lines we ask for (except for solution #1, that can break PHP memory limits in case of large files, returning nothing). But which one is better? <h3>Performance tests</h3> To answer the question I run tests. That's how these thing are done, isn't it? I prepared a sample 100 KB file joining together different files found in my <code>/var/log</code> directory. Then I wrote a PHP script that uses each one of the five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000 lines from the end of the file. Each single test is repeated ten times (that's something like 5 × 28 × 10 = 1400 tests), measuring average elapsed time in microseconds. I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line interpreter. Here are the results: <img src="https://www.lorenzostanco.com/stack/test_tail_100k.png" alt="Execution time on sample 100 KB log file"> Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to read a few lines. Solutions #4 and #5 seem to be the best ones. Note how dynamic buffer size can optimize the algorithm: execution time is a little smaller for few lines, because of the reduced buffer. Let's try with a bigger file. What if we have to read a 10 MB log file? <img src="https://www.lorenzostanco.com/stack/test_tail_10m.png" alt="Execution time on sample 10 MB log file"> Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file into memory is not a great idea. I run the tests also on 1MB and 100MB file, and it's practically the same situation. And for tiny log files? That's the graph for a 10 KB file: <img src="https://www.lorenzostanco.com/stack/test_tail_10k.png" alt="Execution time on sample 10 KB log file"> Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log means something like 150/200 lines... <blockquote> You can download all my test files, sources and results here. </blockquote> <h3>Final thoughts</h3> Solution #5 is heavily recommended for the general use case: works great with every file size and performs particularly good when reading a few lines. Avoid solution #1 if you should read files bigger than 10 KB. Solution #2 and #3 aren't the best ones for each test I run: #2 never runs in less than 2ms, and #3 is heavily influenced by the number of lines you ask (works quite good only with 1 or 2 lines).

What is the best way to read last lines (i.e. "tail") from a file using PHP?

Tags:

performance

php

logging

In my PHP application I need to read multiple lines starting from the end of many files (mostly logs). Sometimes I need only the last one, sometimes I need tens or hundreds. Basically, I want something as flexible as the Unix tail command.

There are questions here about how to get the single last line from a file (but I need N lines), and different solutions were given. I'm not sure about which one is the best and which performs better.

895

asked Feb 22 '13 13:02

lorenzo-s

1 Answers

Methods overview

Searching on the internet, I came across different solutions. I can group them in three approaches:

naive ones that use file() PHP function;
cheating ones that runs tail command on the system;
mighty ones that happily jump around an opened file using fseek().

I ended up choosing (or writing) five solutions, a naive one, a cheating one and three mighty ones.

The most concise naive solution, using built-in array functions.
The only possible solution based on tail command, which has a little big problem: it does not run if tail is not available, i.e. on non-Unix (Windows) or on restricted environments that don't allow system functions.
The solution in which single bytes are read from the end of file searching for (and counting) new-line characters, found here.
The multi-byte buffered solution optimized for large files, found here.
A slightly modified version of solution #4 in which buffer length is dynamic, decided according to the number of lines to retrieve.

All solutions work. In the sense that they return the expected result from any file and for any number of lines we ask for (except for solution #1, that can break PHP memory limits in case of large files, returning nothing). But which one is better?

Performance tests

To answer the question I run tests. That's how these thing are done, isn't it?

I prepared a sample 100 KB file joining together different files found in my /var/log directory. Then I wrote a PHP script that uses each one of the five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000 lines from the end of the file. Each single test is repeated ten times (that's something like 5 × 28 × 10 = 1400 tests), measuring average elapsed time in microseconds.

I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line interpreter. Here are the results:

Execution time on sample 100 KB log file

Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to read a few lines. Solutions #4 and #5 seem to be the best ones. Note how dynamic buffer size can optimize the algorithm: execution time is a little smaller for few lines, because of the reduced buffer.

Let's try with a bigger file. What if we have to read a 10 MB log file?

Execution time on sample 10 MB log file

Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file into memory is not a great idea. I run the tests also on 1MB and 100MB file, and it's practically the same situation.

And for tiny log files? That's the graph for a 10 KB file:

Execution time on sample 10 KB log file

Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log means something like 150/200 lines...

You can download all my test files, sources and results here.

Final thoughts

Solution #5 is heavily recommended for the general use case: works great with every file size and performs particularly good when reading a few lines.

Avoid solution #1 if you should read files bigger than 10 KB.

Solution #2 and #3 aren't the best ones for each test I run: #2 never runs in less than 2ms, and #3 is heavily influenced by the number of lines you ask (works quite good only with 1 or 2 lines).

134

answered Oct 13 '22 06:10

lorenzo-s

Related questions
                            
                                PHP Create and Save a txt file to root directory
                            
                                MySQL - ignore insert error: duplicate entry
                            
                                Converting a SimpleXML Object to an Array [closed]
                            
                                PHP Notice: Undefined offset: 1 with array when reading data
                            
                                How to convert a string to JSON object in PHP
                            
                                Call php function from JavaScript
                            
                                How to set the default value of an attribute on a Laravel model [duplicate]
                            
                                Symfony2 - creating own vendor bundle - project and git strategy
                            
                                How do I enable --enable-soap in php on linux?
                            
                                phpexcel to download
                            
                                How to inject a repository into a service in Symfony?
                            
                                Fatal error: Uncaught Error: Call to undefined function mysql_connect()
                            
                                Why does 1234 == '1234 test' evaluate to true? [duplicate]
                            
                                Java equivalent of PHP's implode(',' , array_filter( array () ))
                            
                                How to check which PHP extensions have been enabled/disabled in Ubuntu Linux 12.04 LTS?
                            
                                Optimizing Kohana-based Websites for Speed and Scalability
                            
                                What is the best way to upload and store pictures on the site?
                            
                                Extension gd is missing from your system - laravel composer Update
                            
                                Composer throws [ReflectionException] Class Fxp\Composer\AssetPlugin\Repository\NpmRepository does not exist
                            
                                Is there a simple PHP development server? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the best way to read last lines (i.e. "tail") from a file using PHP?

Tags:

performance

php

logging

lorenzo-s

People also ask

1 Answers

Methods overview

Performance tests

Final thoughts

lorenzo-s

Recent Activity

Donate For Us