Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set-like Data Structure without `Ord`?

Tags:

haskell

set

Given the following types:

import Data.Set as Set

-- http://json.org/

type Key = String

data Json = JObject Key (Set JValue)
            | JArray JArr
            deriving Show

data JObj = JObj Key JValue
            deriving Show

data JArr = Arr [JValue] deriving Show

data Null = Null deriving Show

data JValue = Num Double
              | S String
              | B Bool
              | J JObj
              | Array JArr
              | N Null
               deriving Show

I created a JObject Key (Set Value) with a single element:

ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])

But, when I tried to create a 2-element Set, I got a compile-time error:

ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))

<interactive>:159:16:
    No instance for (Ord JValue) arising from a use of ‘insert’
    In the expression: insert (Num 5.5)
    In the second argument of ‘JObject’, namely
      ‘(insert (Num 5.5) $ singleton (B True))’
    In the expression:
      JObject "foo" (insert (Num 5.5) $ singleton (B True))

So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"

The docs on Data.Set answer that question.

The implementation of Set is based on size balanced binary trees (or trees of bounded balance)

But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?

like image 232
Kevin Meredith Avatar asked Feb 22 '15 16:02

Kevin Meredith


1 Answers

You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.

One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.

You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.

Why Eq is not good enough

The simplest way to implement a set is using a list:

type Set a = [a]

member a [] = False
member (x:xs) | a == x = True
              | otherwise = member a xs

insert a xs | member a xs = xs
            | otherwise = a:xs

This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.

To improve matters, we need to use some sort of tree:

data Set a = Node a (Set a) (Set a) | Tip

There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.

The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.

Side note on Ord

Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:

newtype WrappedThing = Wrap Thing

instance Ord WrappedThing where
  ....

newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...

Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).

like image 83
dfeuer Avatar answered Sep 27 '22 18:09

dfeuer