In my PHP application I need to read multiple lines starting from the end of many files (mostly logs). Sometimes I need only the last one, sometimes I need tens or hundreds. Basically, I want something as flexible as the Unix tail
command.
There are questions here about how to get the single last line from a file (but I need N lines), and different solutions were given. I'm not sure about which one is the best and which performs better.
The 'fseek' function is used to move to the end of the file or the last line. The line is read until a newline is encountered. After this, the read characters are displayed.
To look at the last few lines of a file, use the tail command. tail works the same way as head: type tail and the filename to see the last 10 lines of that file, or type tail -number filename to see the last number lines of the file.
Use the tail command to write the file specified by the File parameter to standard output beginning at a specified point. This displays the last 10 lines of the accounts file. The tail command continues to display lines as they are added to the accounts file.
tail [OPTION]... [ Tail is a command which prints the last few number of lines (10 lines by default) of a certain file, then terminates. Example 1: By default “tail” prints the last 10 lines of a file, then exits.
Searching on the internet, I came across different solutions. I can group them in three approaches:
file()
PHP function;tail
command on the system;fseek()
.I ended up choosing (or writing) five solutions, a naive one, a cheating one and three mighty ones.
tail
command, which has a little big problem: it does not run if tail
is not available, i.e. on non-Unix (Windows) or on restricted environments that don't allow system functions.All solutions work. In the sense that they return the expected result from any file and for any number of lines we ask for (except for solution #1, that can break PHP memory limits in case of large files, returning nothing). But which one is better?
To answer the question I run tests. That's how these thing are done, isn't it?
I prepared a sample 100 KB file joining together different files found in my /var/log
directory. Then I wrote a PHP script that uses each one of the five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000 lines from the end of the file. Each single test is repeated ten times (that's something like 5 × 28 × 10 = 1400 tests), measuring average elapsed time in microseconds.
I run the script on my local development machine (Xubuntu 12.04, PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line interpreter. Here are the results:
Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to read a few lines. Solutions #4 and #5 seem to be the best ones. Note how dynamic buffer size can optimize the algorithm: execution time is a little smaller for few lines, because of the reduced buffer.
Let's try with a bigger file. What if we have to read a 10 MB log file?
Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file into memory is not a great idea. I run the tests also on 1MB and 100MB file, and it's practically the same situation.
And for tiny log files? That's the graph for a 10 KB file:
Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log means something like 150/200 lines...
You can download all my test files, sources and results here.
Solution #5 is heavily recommended for the general use case: works great with every file size and performs particularly good when reading a few lines.
Avoid solution #1 if you should read files bigger than 10 KB.
Solution #2 and #3 aren't the best ones for each test I run: #2 never runs in less than 2ms, and #3 is heavily influenced by the number of lines you ask (works quite good only with 1 or 2 lines).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With