I have a large Haskell program which is running dismayingly slow. Profiling and testing has revealed that a large fraction of the time is spend comparing equality and ordering of a particular large datatype that is very important. Equality is a useful operation (this is state-space search, and graph search is much preferable to tree search), but I only need an Ord instance for this class in order to use Maps. So what I want to do is say
instance Eq BigThing where
(==) b b' = name b == name b' &&
firstPart b == firstPart b' &&
secondPart b == secondPart b' &&
{- ...and so on... -}
instance Ord BigThing where
compare b b' = compare (name b) (name b')
But since the names may not always be different for different objects, this risks the curious case where two BigThings may be inequal according to ==, but comparing them yields EQ.
Is this going to cause problems with Haskell libraries? Is there another way I could satisfy the requirement for a detailed equality operation but a cheap ordering?
First, using Text
or ByteString
instead of String
could help a lot without changing anything else.
Generally I wouldn't recommend creating an instance of Eq
inconsistent with Ord
. Libraries can rightfully depend on it, and you never know what kind of strange problems it can cause. (For example, are you sure that Map
doesn't use the relationship between Eq
and Ord
?)
If you don't need the Eq
instance at all, you can simply define
instance Eq BigThing where
x == y = compare x y == EQ
Then equality will be consistent with comparison. There is no requirement that equal values must have all fields equal.
If you need an Eq
instance that compares all fields, then you can stay consistent by wrapping BigThing
into a newtype
, define the above Eq
and Ord
for it, and use it in your algorithm whenever you need ordering according to name
:
newtype BigThing' a b c = BigThing' (BigThing a b c)
instance Eq BigThing' where
x == y = compare x y == EQ
instance Ord BigThing' where
compare (BigThing b) (BigThing b') = compare (name b) (name b')
Update: Since you say any ordering is acceptable, you can use hashing to your advantage. For this, you can use the hashable package. The idea is that you pre-compute hash values on data creation and use them when comparing values. If two values are different, it's almost sure that their hashes will differ and you the compare only their hashes (two integers), nothing more. It could look like this:
module BigThing
( BigThing()
, bigThing
, btHash, btName, btSurname
)
where
import Data.Hashable
data BigThing = BigThing { btHash :: Int,
btName :: String,
btSurname :: String } -- etc
deriving (Eq, Ord)
-- Since the derived Eq/Ord instances compare fields lexicographically and
-- btHash is the first, they'll compare the hash first and continue with the
-- other fields only if the hashes are equal.
-- See http://www.haskell.org/onlinereport/derived.html#sect10.1
--
-- Alternativelly, you can create similar Eq/Ord instances yourself, if for any
-- reason you don't want the hash to be the first field.
-- A smart constructor for creating instances. Your module will not export the
-- BigThing constructor, it will export this function instead:
bigThing :: String -> String -> BigThing
bigThing nm snm = BigThing (hash (nm, snm)) nm snm
Note that with this solution, the ordering will be seemingly random with no apparent relation to the fields.
You can also combine this solution with the previous ones. Or, you can create a small module for wrapping any type with its precomputed hash (wrapped values must have Eq
instances consistent with their Hashable
instances).
module HashOrd
( Hashed()
, getHashed
, hashedHash
)
where
import Data.Hashable
data Hashed a = Hashed { hashedHash :: Int, getHashed :: a }
deriving (Ord, Eq, Show, Read, Bounded)
hashed :: (Hashable a) => a -> Hashed a
hashed x = Hashed (hash x) x
instance Hashable a => Hashable (Hashed a) where
hashWithSalt salt (Hashed _ x) = hashWithSalt salt x
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With