Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set-Like data structure that maintains insertion Order?

The properties I'm looking for are

  • initially maintains insertion order
  • transversing in the insertion order
  • and of course maintain that each element is unique

But there are cases where It's okay to disregard insertion order, such as...

  • retrieving a difference between two different sets
  • performing a union the two sets eliminating any duplicates

Java's LinkedHashSet seems to be exactly what I'm after, except for the fact it's not written in Haskell.

current & initial solution

The easiest (and a relevantly inefficient) solution is to implement it as a list and transform it into a set when I need too, but I believe there is likely a better way.

other ideas

My first idea was to implement it as a Data.Set of a newtype of (Int, a) where it would be ordered by the first tuple index, and the second index (a) being the actual value. I quickly realised this wasn't going to work because as the set would allow for duplicates of the type a, which would have defeated the whole purpose of using a set.

simultaneously maintaining a list and a set? (nope)

Another Idea I had was have an abstract data type that would maintain both a list and set representation of the data, which doesn't sound to efficient either.

recap

Are there any descent implementations of such a data structure in Haskell? I've seen Data.List.Ordered but it seems to just add set operations to lists, which sounds terribly inefficient as well (but likely what I'll settle with if I can't find a solution). Another solution suggested here, was to implement it via finger tree, but I would prefer to not reimplement it if it's already a solved problem.

like image 604
akst Avatar asked Aug 16 '14 02:08

akst


People also ask

What sets maintain insertion order?

If we want to maintain the insertion order of the elements, we are supposed to use LinkedHashSet. LinkedHashSet maintains the order in which the elements are inserted.

Does HashSet maintain insertion order?

HashSet does not provide any method to maintain the insertion order. Comparatively, LinkedHashSet maintains the insertion order of the elements.

Does TreeSet maintain insertion order?

Intro to TreeSet It stores unique elements. It doesn't preserve the insertion order of the elements. It sorts the elements in ascending order.

Does Set preserve insertion order?

A Set will not allow duplicate values. And LinkedHashSet will preserve insertion order. Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries.


1 Answers

You can certainly use Data.Set with what is isomorphic to (Int, a), but wrapped in a newtype with different a Eq instance:

newtype Entry a = Entry { unEntry :: (Int, a) } deriving (Show)

instance Eq a => Eq (Entry a) where
    (Entry (_, a)) == (Entry (_, b)) = a == b

instance Ord a => Ord (Entry a) where
    compare (Entry (_, a)) (Entry (_, b)) = compare a b

But this won't quite solve all your problems if you want automatic incrementing of your index, so you could make a wrapper around (Set (Entry a), Int):

newtype IndexedSet a = IndexedSet (Set (Entry a), Int) deriving (Eq, Show)

But this does mean that you'll have to re-implement Data.Set to respect this relationship:

import qualified Data.Set as S
import Data.Set (Set)
import Data.Ord (comparing)
import Data.List (sortBy)

-- declarations from above...

null :: IndexedSet a -> Bool
null (IndexedSet (set, _)) = S.null set

-- | If you re-index on deletions then size will just be the associated index
size :: IndexedSet a -> Int
size (IndexedSet (set, _)) = S.size set

-- Remember that (0, a) == (n, a) for all n
member :: Ord a => a -> IndexedSet a -> Bool
member a (IndexedSet (set, _)) = S.member (Entry (0, a)) set

empty :: IndexedSet a
empty = IndexedSet (S.empty, 0)

-- | This function is critical, you have to make sure to increment the index
--   Might also want to consider making it strict in the i field for performance
insert :: Ord a => a -> IndexedSet a -> IndexedSet a
insert a (IndexedSet (set, i)) = IndexedSet (S.insert (Entry (i, a)) set, i + 1)

-- | Simply remove the `Entry` wrapper, sort by the indices, then strip those off
toList :: IndexedSet a -> [a]
toList (IndexedSet (set, _))
    = map snd
    $ sortBy (comparing fst)
    $ map unEntry
    $ S.toList set

But this is fairly trivial in most cases and you can add functionality as you need it. The only thing you'll need to really worry about is what to do in deletions. Do you re-index everything or are you just concerned about order? If you're just concerned about order, then it's simple (and size can be left sub-optimal by having to actually calculate the size of the underlying Set), but if you re-index then you can get your size in O(1) time. These sorts of decisions should be decided based on what problem you're trying to solve.


I would prefer to not reimplement it if it's already a solved problem.

This approach is definitely a re-implementation. But it isn't complicated in most of the cases, could be pretty easily turned into a nice little library to upload to Hackage, and retains a lot of the benefits of sets without much bookkeeping.

like image 104
bheklilr Avatar answered Nov 16 '22 02:11

bheklilr