When using the file input with Logstash, a sincedb file is written in order to keep track of the current position of monitored log files. How to understand its contents?
Example of a sincedb file:
286105 0 19 20678374
sincedb_path just needs to be a directory where logstash has write permission for the registry. sincedb_write_interval defines how often logstash should write the sincedb registry. A larger value puts you at risk in logstash were to crash.
Overview of Logstash. Logstash is an open-source tool that is used for the real-time pipeline for the data injection between two different independent resources.
If you disable the sincedb (setting it to /dev/null) then logstash will read all the files that match the path from the beginning. However, if you have a sincedb and let logstash process the files, then if you restart logstash it will start tailing those files waiting for data to be appended to them.
There are 4 fields (source):
Assuming that a hard disk would be segmented in thousands of very tiny parts with a number for each one, the inode would be more or less like the number of the tiny part where the file begins. So a given inode is unique to each hard disk, but in order to address cases where there are multiple disks on the same server, using major and minor device number is required in order to guarantee uniqueness of the triplet {inode, minor device number, minor device number}. More accurate info about inodes on Wikipedia.
That said, I am not so sure that (for example) files mounted through NFS could not collide with local files since the inode of a file mounted through NFS seems to be the remote one. Even though I don't think that the plugin writer bothered about such cases, and despite using NFS myself, never ran into any trouble so far. Also I suspect the collision probability to be very tiny.
Now with the triplet formed by inode and major and minor device number we have a way of targeting the single log file that is being read by the plugin without error (or at least that was the original intent). The last number, the byte offset, keeps track of how far the input log file as already been read and outputted to Logstash.
In some specific architectures like Solaris or Windows there have been bugs with ruby wrongly detecting the inode number, which was equal to 0. This could for example lead to issues like logstash not detecting a file rotation.
This was super helpful. I wanted to map all my SinceDB files to the logstash inputs, so I put together a little bash two-liner to print this mapping.
filesystems=$(grep path /etc/logstash/conf.d/*.conf | awk -F'=>' '{ print $2 }' | xargs -I {} df -P {} 2>/dev/null | grep -v Filesystem | sort | uniq | cut -d' ' -f 1)
for fs in $filesystems; do for f in $(ls -a .sincedb_*); do echo $f; inodes=$(cut -d' ' -f 1 $f); for inode in $inodes; do sudo debugfs -R "ncheck $inode" $fs 2>/dev/null | grep -v Inode | cut -f 2; done; echo; done; done
I just documented the details about mapping SinceDB files to logstash input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With