Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generalize reads from url and file in Haskell

Tags:

haskell

I develop an application that borrows data from the Internet by chunks with the given offset. For testing purposes I have a dump file that contains lines where each line corresponds to the separate chunk. I want to generalize read operations from url and dump file. Currently, I have the following functions:

getChunk :: DataSourceMode -> Config -> Int -> Int -> IO FetchResult
getChunk DSNormal config ownerId' offset' = do ...
getChunk DSFromFile config ownerId' offset' = do ...

The problem with the current implementation is that it reads dump file on each getChunk call and it's, obviously, ineffective. The first idea is to save the lines from the dump file into list, but then it wouldn't be easy to generalize it with readings from url. I suppose, conduits or pipes could be used to construct source of chunks, but I'm not familiar with these libraries; should I use one of them, or, maybe, there's a better solution?

like image 990
Mikhail Selivanov Avatar asked Oct 01 '22 01:10

Mikhail Selivanov


1 Answers

I ended up with conduits. Used generalized function processFeed as a sink and then pushed into it data from postUrlSource or Data.Conduit.Binary.sourceFile, depending on mode.

import Data.Conduit.Binary as CB(sourceFile, conduitFile, lines)

processFeed :: MonadIO m => Config -> OwnerId -> (OwnerId -> [Post] -> IO ()) -> Sink BS.ByteString m FetchResult
processFeed config ownerId' processFn = do ...

postUrlSource :: MonadIO m => Config -> OwnerId -> Source (StateT FetchState (m)) BS.ByteString
postUrlSource config ownerId' = do ...

...

  _ <- case (dsMode config) of    
    DSFromFile -> do
      runResourceT $ CB.sourceFile dumpFile $= CB.lines $$ (processFeed config publicId' saveResult)
    DSNormal -> do
      let postsFromUrlConduit = (postUrlSource config publicId') $$ (processFeed config publicId' saveResult)
      fetchedPosts <- runStateT postsFromUrlConduit (FetchState 0 "")
      return $ fst fetchedPosts

...

StateT is used for the case when we fetch data from the url, so, each chunk is fetched with a new offset. For reading from file it's IO monad, it just read lines sequentially from the dump.

like image 163
Mikhail Selivanov Avatar answered Nov 15 '22 10:11

Mikhail Selivanov