First up, a simplified version of the task I want to accomplish: I have several large files (amounting to 30GB) that I want to prune for duplicate entries. To this end, I establish a database of hashes of the data, and open the files one-by-one, hashing each item, and recording it in the database and the output file iff its hash wasn't already in the database.
I know how to do this with iteratees, enumerators, and I wanted to try conduits. I also know how to do it with conduits, but now I want to use conduits & persistent. I'm having problems with the types, and possibly with the entire concept of ResourceT
.
Here's some pseudo code to illustrate the problem:
withSqlConn "foo.db" $ runSqlConn $ runResourceT $
sourceFile "in" $= parseBytes $= dbAction $= serialize $$ sinkFile "out"
The problem lies in the dbAction
function. I would like to access the database here, naturally. Since the action it does is basically just a filter, I first thought to write it like that:
dbAction = CL.mapMaybeM p
where p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => DataType -> m (Maybe DataType)
p = lift $ putStrLn "foo" -- fine
insert $ undefined -- type error!
return undefined
The specific error I get is:
Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
bound by the type signature for
p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
DataType -> m (Maybe DataType)
at tools/clean-wac.hs:(33,1)-(34,34)
`m' is a rigid type variable bound by
the type signature for
p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
DataType -> m (Maybe (DataType))
at tools/clean-wac.hs:33:1
Expected type: m (Key b0 val0)
Actual type: b0 m0 (Key b0 val0)
Note that this might be due to wrong assumptions I made in designing the type signature. If I comment out the type signature and also remove the lift
statement, the error message turns into:
No instance for (PersistStore ResourceT (SqlPersist IO))
arising from a use of `p'
Possible fix:
add an instance declaration for
(PersistStore ResourceT (SqlPersist IO))
In the first argument of `CL.mapMaybeM', namely `p'
So this means that we can't access the PersistStore
at all via ResourceT
?
I cannot write my own Conduit either, without using CL.mapMaybeM
:
dbAction = filterP
filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType
filterP = loop
where loop = awaitE >>= either return go
go s = do lift $ insert $ undefined -- again, type error
loop
This resulted in yet another type error I don't fully understand.
Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
bound by the type signature for
filterP :: (MonadIO m,
MonadBaseControl IO (SqlPersist m)) =>
Conduit DataType m DataType
`m' is a rigid type variable bound by
the type signature for
filterP :: (MonadIO m,
MonadBaseControl IO (SqlPersist m)) =>
Conduit DataType m DataType
Expected type: Conduit DataType m DataType
Actual type: Pipe
DataType DataType DataType () (b0 m0) ()
In the expression: loop
In an equation for `filterP'
So, my question is: is it possible to use persistent like I intended to inside a conduit at all? And if, how? I am aware that since I can use liftIO
inside the conduit, I could just go and use, say HDBC
, but I wanted to use persistent explicitly in order to understand how it works, and because I like its db-backend agnosticism.
The code below compiles fine for me. Is it possible that the frameworks have moved on inthe meantime and things now just work?
However note the following changes I had to make as the world has changed a bit or I didn't have all your code. I used conduit-1.0.9.3 and persistent-1.3.0 with GHC 7.6.3.
Omitted parseBytes
and serialise
as I don't have your definitions and defined DataType = ByteString
instead.
Introduced a Proxy
parameter and an explicit type signature for the undefined
value to avoid problems with type family injectivity. These likely don't arise in your real code because it will have a concrete or externally determined type for val
.
Used await
rather than awaitE
and just used ()
as the type to substitute for the Left
case, as awaitE
has been retired.
Passed a dummy Connection
creation function to withSqlConn
- perhaps I should have used some Sqlite specific function?
Here's the code:
{-# LANGUAGE FlexibleContexts, NoMonomorphismRestriction,
TypeFamilies, ScopedTypeVariables #-}
module So133331988 where
import Control.Monad.Trans
import Database.Persist.Sql
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import Data.Proxy
test proxy =
withSqlConn (return (undefined "foo.db")) $ runSqlConn $ runResourceT $
sourceFile "in" $= dbAction proxy $$ sinkFile "out"
dbAction = filterP
type DataType = ByteString
filterP
:: forall m val
. ( MonadIO m, MonadBaseControl IO (SqlPersist m)
, PersistStore m, PersistEntity val
, PersistEntityBackend val ~ PersistMonadBackend m)
=> Proxy val
-> Conduit DataType m DataType
filterP Proxy = loop
where loop = await >>= maybe (return ()) go
go s = do lift $ insert (undefined :: val)
loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With