Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to (efficiently) follow / tail a file with Haskell, including detecting file rotation? (tail -F)

Tags:

haskell

In essence I wish to know how to approach implementing tail -F Linux command functionality in Haskell. My goal is to follow a log file, such as a web server log file, and compute various real time statistics by parsing the input as it comes in. Ideally with no interruptions if the log file is rotated with logrotate or similar service.

I'm somewhat at loss on how to even approach the problem and what should I take into consideration in terms of performance in presence of lazy I/O. Would any of the streaming libraries be relevant here?

like image 709
ppb Avatar asked Dec 19 '16 20:12

ppb


1 Answers

This is a partial answer, as it doesn't handle file truncation by logrotate. It avoids lazy I/O and uses the bytestring, streaming, streaming-bytestring and hinotify packages.

Some preliminary imports:

{-# language OverloadedStrings #-}
module Main where

import qualified Data.ByteString
import Data.ByteString.Lazy.Internal (defaultChunkSize)
import qualified Data.ByteString.Streaming as B
import Streaming
import qualified Streaming.Prelude as S
import Control.Concurrent.QSem
import System.INotify
import System.IO (withFile,IOMode(ReadMode))
import System.Environment (getArgs)

Here's the "tailing" function:

tailing :: FilePath -> (B.ByteString IO () -> IO r) -> IO r
tailing filepath continuation = withINotify $ \i -> do
    sem <- newQSem 1
    addWatch i [Modify] filepath (\_ -> signalQSem sem)
    withFile filepath ReadMode (\h -> continuation (handleToStream sem h))
    where
    handleToStream sem h = B.concat . Streaming.repeats $ do
        lift (waitQSem sem)
        readWithoutClosing h
    -- Can't use B.fromHandle here because annoyingly it closes handle on EOF
    -- instead of just returning, and this causes problems on new appends.
    readWithoutClosing h = do
        c <- lift (Data.ByteString.hGetSome h defaultChunkSize)
        if Data.ByteString.null c
           then return ()
           else do B.chunk c
                   readWithoutClosing h

It takes a file path an a callback that consumes a streaming bytestring.

The idea is that, each time before reading from the handle until EOF, we decrement a semaphore, which is only increased by the callback that is invoked when the file is modified.

We can test the function like this:

main :: IO ()
main = do
    filepath : _ <- getArgs
    tailing filepath B.stdout
like image 98
danidiaz Avatar answered Nov 15 '22 11:11

danidiaz