Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you implement tail efficiently?

Tags:

c

linux

unix

tail

What is the efficient way to implement tail in *NIX? I came up (wrote) with two simple solution, both using kind of circular buffer to load lines into circular structure (array | doubly linked circular list - for fun). I've seen part of older implementation in busybox and from what I understood, they used fseek to find EOF and then read stuff "backwards". Is there anything cleaner and faster out there? I got asked this on interview and asker did not look satisfied. Thank you in advance.

like image 814
Tomas Pruzina Avatar asked Apr 15 '12 18:04

Tomas Pruzina


People also ask

How is tail command implemented in Java?

From main method start executor service to start log file tailer, i.e. crunchifyExecutor. execute(crunchify_tailF); which internally calls run() Also call appendData() method which will add new line to file every 5 seconds. Once new line will be added to file, tailer will pick and print it to Eclipse Console.

How does the tail command work?

The tail command shows you data from the end of a file. Usually, new data is added to the end of a file, so the tail command is a quick and easy way to see the most recent additions to a file. It can also monitor a file and display each new text entry to that file as they occur.

What is use of tail command in Linux?

The basic functionality of the Linux tail command is to output the end of a file. Typically, new data added to a file ends up at its tail (i.e., the end). So, the Linux tail command allows us to check if a file has new data attached. Therefore, the Linux tail command is a popular tool to evaluate and monitor log files.


3 Answers

I don't think there are solutions different than "keep the latest N lines while reading forward the data" or "start from the end and go backwards until you read the Nth line".

The point is that you'd use one or the another based on the context.

The "go to the end and go backwards" is better when tail accesses a random access file, or when the data is small enough to be put on memory. In this case the runtime is minimized, since you scan the data that has to be outputted (so, it's "optimal")

Your solution (keep the N latest lines) is better when tail is fed with a pipeline or when the data is huge. In this case, the other solution wastes too much memory, so it is not practical and, in the case the source is slower than tail (which is probable) scanning all the file doesn't matter that much.

like image 173
akappa Avatar answered Oct 04 '22 21:10

akappa


Read backwards from the end of the file until N linebreaks are read or the beginning of the file is reached.

Then print what was just read.

I dont think any fancy datastructures are needed here.

Here is the source code of tail if you're interested.

like image 24
thumbmunkeys Avatar answered Oct 04 '22 23:10

thumbmunkeys


First use fseek to find the end-of-file then subtract 512 and fseek to that offset, then read forward from there to end. Count the number of line-breaks because if there are too few you will have to do the same with a subtracted offset of 1024 ... but in 99% of cases 512 will be enough.

This (1) avoids reading the whole file forward and (2) the reason why this is probably more efficient than reading backwards from the end is that reading forward is typically faster.

like image 34
Bernd Elkemann Avatar answered Oct 04 '22 21:10

Bernd Elkemann