Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using persistent from within a Conduit

First up, a simplified version of the task I want to accomplish: I have several large files (amounting to 30GB) that I want to prune for duplicate entries. To this end, I establish a database of hashes of the data, and open the files one-by-one, hashing each item, and recording it in the database and the output file iff its hash wasn't already in the database.

I know how to do this with iteratees, enumerators, and I wanted to try conduits. I also know how to do it with conduits, but now I want to use conduits & persistent. I'm having problems with the types, and possibly with the entire concept of ResourceT.

Here's some pseudo code to illustrate the problem:

withSqlConn "foo.db" $ runSqlConn $ runResourceT $ 
     sourceFile "in" $= parseBytes $= dbAction $= serialize $$ sinkFile "out"

The problem lies in the dbAction function. I would like to access the database here, naturally. Since the action it does is basically just a filter, I first thought to write it like that:

dbAction = CL.mapMaybeM p
     where p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => DataType -> m (Maybe DataType)
           p = lift $ putStrLn "foo" -- fine
           insert $ undefined -- type error!
           return undefined

The specific error I get is:

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                           DataType -> m (Maybe DataType)
  at tools/clean-wac.hs:(33,1)-(34,34)
  `m' is a rigid type variable bound by
      the type signature for
        p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                      DataType -> m (Maybe (DataType))
      at tools/clean-wac.hs:33:1
Expected type: m (Key b0 val0)
  Actual type: b0 m0 (Key b0 val0)

Note that this might be due to wrong assumptions I made in designing the type signature. If I comment out the type signature and also remove the lift statement, the error message turns into:

No instance for (PersistStore ResourceT (SqlPersist IO))
  arising from a use of `p'
Possible fix:
  add an instance declaration for
  (PersistStore ResourceT (SqlPersist IO))
In the first argument of `CL.mapMaybeM', namely `p'

So this means that we can't access the PersistStore at all via ResourceT?

I cannot write my own Conduit either, without using CL.mapMaybeM:

dbAction = filterP
filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType
filterP = loop
    where loop = awaitE >>= either return go
          go s = do lift $ insert $ undefined -- again, type error
                    loop

This resulted in yet another type error I don't fully understand.

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             filterP :: (MonadIO m,
                                 MonadBaseControl IO (SqlPersist m)) =>
                                Conduit DataType m DataType
     `m' is a rigid type variable bound by
      the type signature for
        filterP :: (MonadIO m,
                            MonadBaseControl IO (SqlPersist m)) =>
                           Conduit DataType m DataType
Expected type: Conduit DataType m DataType
  Actual type: Pipe
                 DataType DataType DataType () (b0 m0) ()
In the expression: loop
In an equation for `filterP'

So, my question is: is it possible to use persistent like I intended to inside a conduit at all? And if, how? I am aware that since I can use liftIO inside the conduit, I could just go and use, say HDBC, but I wanted to use persistent explicitly in order to understand how it works, and because I like its db-backend agnosticism.

like image 577
Aleksandar Dimitrov Avatar asked Nov 11 '12 14:11

Aleksandar Dimitrov


1 Answers

The code below compiles fine for me. Is it possible that the frameworks have moved on inthe meantime and things now just work?

However note the following changes I had to make as the world has changed a bit or I didn't have all your code. I used conduit-1.0.9.3 and persistent-1.3.0 with GHC 7.6.3.

  • Omitted parseBytes and serialise as I don't have your definitions and defined DataType = ByteString instead.

  • Introduced a Proxy parameter and an explicit type signature for the undefined value to avoid problems with type family injectivity. These likely don't arise in your real code because it will have a concrete or externally determined type for val.

  • Used await rather than awaitE and just used () as the type to substitute for the Left case, as awaitE has been retired.

  • Passed a dummy Connection creation function to withSqlConn - perhaps I should have used some Sqlite specific function?

Here's the code:

{-# LANGUAGE FlexibleContexts, NoMonomorphismRestriction,
             TypeFamilies, ScopedTypeVariables #-}

module So133331988 where

import Control.Monad.Trans
import Database.Persist.Sql
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import Data.Proxy

test proxy =
    withSqlConn (return (undefined "foo.db")) $ runSqlConn $ runResourceT $ 
         sourceFile "in" $= dbAction proxy $$ sinkFile "out"

dbAction = filterP

type DataType = ByteString

filterP
    :: forall m val
     . ( MonadIO m, MonadBaseControl IO (SqlPersist m)
       , PersistStore m, PersistEntity val
       , PersistEntityBackend val ~ PersistMonadBackend m)
    => Proxy val
    -> Conduit DataType m DataType
filterP Proxy = loop
    where loop = await >>= maybe (return ()) go
          go s = do lift $ insert (undefined :: val)
                    loop
like image 104
GS - Apologise to Monica Avatar answered Nov 16 '22 02:11

GS - Apologise to Monica