Java's LinkedHashSet seems to be exactly what I'm after, except for the fact it's not written in Haskell.
The easiest (and a relevantly inefficient) solution is to implement it as a list and transform it into a set when I need too, but I believe there is likely a better way.
My first idea was to implement it as a Data.Set
of a newtype
of (Int, a)
where it would be ordered by the first tuple index, and the second index (a
) being the actual value. I quickly realised this wasn't going to work because as the set would allow for duplicates of the type a
, which would have defeated the whole purpose of using a set.
Another Idea I had was have an abstract data type that would maintain both a list and set representation of the data, which doesn't sound to efficient either.
Are there any descent implementations of such a data structure in Haskell? I've seen Data.List.Ordered but it seems to just add set operations to lists, which sounds terribly inefficient as well (but likely what I'll settle with if I can't find a solution). Another solution suggested here, was to implement it via finger tree, but I would prefer to not reimplement it if it's already a solved problem.
If we want to maintain the insertion order of the elements, we are supposed to use LinkedHashSet. LinkedHashSet maintains the order in which the elements are inserted.
HashSet does not provide any method to maintain the insertion order. Comparatively, LinkedHashSet maintains the insertion order of the elements.
Intro to TreeSet It stores unique elements. It doesn't preserve the insertion order of the elements. It sorts the elements in ascending order.
A Set will not allow duplicate values. And LinkedHashSet will preserve insertion order. Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.
You can certainly use Data.Set
with what is isomorphic to (Int, a)
, but wrapped in a newtype with different a Eq
instance:
newtype Entry a = Entry { unEntry :: (Int, a) } deriving (Show)
instance Eq a => Eq (Entry a) where
(Entry (_, a)) == (Entry (_, b)) = a == b
instance Ord a => Ord (Entry a) where
compare (Entry (_, a)) (Entry (_, b)) = compare a b
But this won't quite solve all your problems if you want automatic incrementing of your index, so you could make a wrapper around (Set (Entry a), Int)
:
newtype IndexedSet a = IndexedSet (Set (Entry a), Int) deriving (Eq, Show)
But this does mean that you'll have to re-implement Data.Set
to respect this relationship:
import qualified Data.Set as S
import Data.Set (Set)
import Data.Ord (comparing)
import Data.List (sortBy)
-- declarations from above...
null :: IndexedSet a -> Bool
null (IndexedSet (set, _)) = S.null set
-- | If you re-index on deletions then size will just be the associated index
size :: IndexedSet a -> Int
size (IndexedSet (set, _)) = S.size set
-- Remember that (0, a) == (n, a) for all n
member :: Ord a => a -> IndexedSet a -> Bool
member a (IndexedSet (set, _)) = S.member (Entry (0, a)) set
empty :: IndexedSet a
empty = IndexedSet (S.empty, 0)
-- | This function is critical, you have to make sure to increment the index
-- Might also want to consider making it strict in the i field for performance
insert :: Ord a => a -> IndexedSet a -> IndexedSet a
insert a (IndexedSet (set, i)) = IndexedSet (S.insert (Entry (i, a)) set, i + 1)
-- | Simply remove the `Entry` wrapper, sort by the indices, then strip those off
toList :: IndexedSet a -> [a]
toList (IndexedSet (set, _))
= map snd
$ sortBy (comparing fst)
$ map unEntry
$ S.toList set
But this is fairly trivial in most cases and you can add functionality as you need it. The only thing you'll need to really worry about is what to do in deletions. Do you re-index everything or are you just concerned about order? If you're just concerned about order, then it's simple (and size
can be left sub-optimal by having to actually calculate the size of the underlying Set
), but if you re-index then you can get your size in O(1)
time. These sorts of decisions should be decided based on what problem you're trying to solve.
I would prefer to not reimplement it if it's already a solved problem.
This approach is definitely a re-implementation. But it isn't complicated in most of the cases, could be pretty easily turned into a nice little library to upload to Hackage, and retains a lot of the benefits of sets without much bookkeeping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With