Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding unique (as in only occurring once) element haskell

I need a function which takes a list and return unique element if it exists or [] if it doesn't. If many unique elements exists it should return the first one (without wasting time to find others). Additionally I know that all elements in the list come from (small and known) set A. For example this function does the job for Ints:

unique :: Ord a => [a] -> [a]
unique li = first $ filter ((==1).length) ((group.sort) li)
    where first [] = []
          first (x:xs) = x

ghci> unique [3,5,6,8,3,9,3,5,6,9,3,5,6,9,1,5,6,8,9,5,6,8,9]
ghci> [1]

This is however not good enough because it involves sorting (n log n) while it could be done in linear time (because A is small). Additionally it requires the type of list elements to be Ord while all which should be needed is Eq. It would also be nice if amount of comparisons was as small as possible (ie if we traverse a list and encounter element el twice we don't test subsequent elements for equality with el)

This is why for example this: Counting unique elements in a list doesn't solve the problem - all answers involve either sorting or traversing the whole list to find count of all elements.

The question is: how to do it correctly and efficiently in Haskell ?

like image 514
Piotr Lopusiewicz Avatar asked Apr 17 '13 22:04

Piotr Lopusiewicz


1 Answers

Okay, linear time, from a finite domain. The running time will be O((m + d) log d), where m is the size of the list and d is the size of the domain, which is linear when d is fixed. My plan is to use the elements of the set as the keys of a trie, with the counts as values, then look through the trie for elements with count 1.

import qualified Data.IntTrie as IntTrie
import Data.List (foldl')
import Control.Applicative

Count each of the elements. This traverses the list once, builds a trie with the results (O(m log d)), then returns a function which looks up the result in the trie (with running time O(log d)).

counts :: (Enum a) => [a] -> (a -> Int)
counts xs = IntTrie.apply (foldl' insert (pure 0) xs) . fromEnum
    where
    insert t x = IntTrie.modify' (fromEnum x) (+1) t

We use the Enum constraint to convert values of type a to integers in order to index them in the trie. An Enum instance is part of the witness of your assumption that a is a small, finite set (Bounded would be the other part, but see below).

And then look for ones that are unique.

uniques :: (Eq a, Enum a) => [a] -> [a] -> [a]
uniques dom xs = filter (\x -> cts x == 1) dom
    where
    cts = counts xs

This function takes as its first parameter an enumeration of the entire domain. We could have required a Bounded a constraint and used [minBound..maxBound] instead, which is semantically appealing to me since finite is essentially Enum+Bounded, but quite inflexible since now the domain needs to be known at compile time. So I would choose this slightly uglier but more flexible variant.

uniques traverses the domain once (lazily, so head . uniques dom will only traverse as far as it needs to to find the first unique element -- not in the list, but in dom), for each element running the lookup function which we have established is O(log d), so the filter takes O(d log d), and building the table of counts takes O(m log d). So uniques runs in O((m + d) log d), which is linear when d is fixed. It will take at least Ω(m log d) to get any information from it, because it has to traverse the whole list to build the table (you have to get all the way to the end of the list to see if an element was repeated, so you can't do better than this).

like image 71
luqui Avatar answered Oct 09 '22 14:10

luqui