Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.Foldable for unordered containers

I'm working on a Haskell-meets-SQL language for database manipulations, and on a common type class library to go with it, cribbing from Hackage wherever it makes sense.

Because a significant objective of a database query optimizer is to eliminate unnecessary sorting, it's important to preserve a static representation of where sorting is in fact necessary. Which brings us to defining a typeclass for folds.

Haskell's Data.Foldable has: (eliding default definitions which aren't relevant to the point I'm making)

class Foldable t where
  -- | Combine the elements of a structure using a monoid.
  fold :: Monoid m => t m -> m

  -- | Map each element of the structure to a monoid,
  -- and combine the results.
  foldMap :: Monoid m => (a -> m) -> t a -> m

  -- | Right-associative fold of a structure.
  foldr :: (a -> b -> b) -> b -> t a -> b

  -- | Left-associative fold of a structure.
  foldl :: (a -> b -> a) -> a -> t b -> a

  -- | A variant of 'foldr' that has no base case,
  -- and thus may only be applied to non-empty structures.
  foldr1 :: (a -> a -> a) -> t a -> a

  -- | A variant of 'foldl' that has no base case,
  -- and thus may only be applied to non-empty structures.
  foldl1 :: (a -> a -> a) -> t a -> a

It seems to me that this class ignores a distinction which is, for practical purposes, not so important to most Haskell applications but of much more interest in a database setting. To wit: all Data.Foldable instances come with an ordering.

What is the name for the generalization of this concept that applies at container types which don't impose an ordering on their elements?

For Haskell Data.Sets it works out fine, because there is an Ord context required by the implementation. The ordering requirement is an implementation artifact though, and for many useful types the ordering being used may not have any domain-level meaning.

For sets more generally the fold :: Monoid m => t m -> m definition on its own is mostly right (so is foldMap). I say mostly because its type includes the associativity law (through the definition of Monoid) but not the required commutativity law. The other variants don't even exist.

I don't want to introduce sorts where they aren't needed. I also don't want to introduce non-determinism where it can't be tracked. I'm interested in building a language and library that doesn't have a toList :: Set a -> [a] function lying around anywhere, because it introduces a dichotomy between:

  1. Allowing people to observe implementation details about how a set/relation is physically stored
  2. Losing track of non-determinism as an effect

Obviously both sortBy :: (a -> a -> Ordering) -> Set a -> [a] and shuffle :: Set a -> Data.Random.RVar [a] are useful, unobjectionable, and will be included. In fact, sortBy has an even more general type as sortBy :: (TheUnorderedFoldableClassIAmTryingToName f) => (a -> a -> Ordering) -> f a -> [a].

What is this idea called? If I'm way off base, where did I leave the base path?

like image 822
Doug McClean Avatar asked Nov 23 '11 20:11

Doug McClean


1 Answers

The operation performed by your fold-like operator would not operate over a monoid, but rather, a commutative semigroup. That gives you op :: (CSemi a) => f a -> a -> a

In the literature I've seen, the typical name for your operator/typeclass would just be CFold -- short for commutative fold. (YAHT also uses cfold as the name for a cps-style fold, but I don't think that's in common usage)

like image 151
sclv Avatar answered Oct 06 '22 23:10

sclv