This doesn't work
data Cutlery = Knife | Fork deriving (Show,Eq)
let x = [Knife,Fork]
let set1 = Set.fromList x
while defining
data Cutlery = Knife | Fork deriving (Show,Ord,Eq)
solves the issue but doesn't make sense. Is Data.Set different than the mathematical definition of a set?
A Data.Set
captures the mathematical abstraction of a set, but it is not identical. The main difference is that a Data.Set
requires its elements to be ordered, whereas a mathematical set only requires that its elements be comparable for equality.
The reason for requiring Ord
is efficiency. It would be perfectly possible to build a set abstraction by defining
data Set a = Set [a]
i.e. under the hood it is just a list, and we make sure we never insert duplicate elements. The elem
and insert
operations would be
elem a (Set as) = any (a ==) as
insert a (Set as) | a `elem` as = Set as
| otherwise = Set (a:as)
However, this means that both elem
and insert
are O(n) operations. If we want to do any better than this, the standard approaches are
Ord
instance)Hashable
instance).The implementation chosen by the authors of Data.Set
was to use a binary tree, which you can see by going to the source. The implementation is more or less
data Set a = Bin a (Set a) (Set a)
| Tip
Now you can write the elem
function as
elem :: Ord a => a -> Set a -> Bool
elem = go
where
go _ Tip = False
go x (Bin y l r) = case compare x y of
LT -> go x l
GT -> go x r
EQ -> True
which is an O(log n) operation, rather than O(n). Insertions are trickier (as you need to keep the tree balanced) but similar.
In a hash set, you don't directly compare elements when inserting and removing them. Instead, each element is hashed to an integer, and stored in a location based on that integer.
In theory this doesn't require an Ord
instance. In practice, you need some method of keeping track of multiple elements that hash to the same value, and the method chosen by the developers of Data.HashSet
is to store multiple elements in a regular Data.Set
, so it turns out you do need the Ord
instance after all!
data HashSet a = HashSet (Data.IntMap.IntMap (Data.Set.Set a))
It could have been written instead as
data HashSet a = HashSet (Data.IntMap.IntMap [a])
instead, which removes out the Ord
requirement at the cost of some inefficiency if there are many elements which has to the same value.
Is
Data.Set
different than the mathematical definition of a set?
Obviously, mathematical sets can be uncountable infinite – you won’t be able to represent that in all generality with a computer, or even a Turing machine.
But the answer you are looking for is this: Data.Set
is a data type based on binary trees, and needs a total linear order on the elements to know whether to put and later find something on the left or right subtree of a node. So while it would be possible to implement a set datatype without an Ord
constraint, this particular, more efficient implementation would not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With