Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I count one list against another, in O(n*log(n))?

I am looking for a function that can efficiently count, for each element in one list, its occurrences in another list. It should return a sorted list of element/count tuples. Here's the specification:

countList :: Ord a => [a] -> [a] -> [(a, Integer)]
countList ['a', 'b', 'c', 'b', 'b'] ['a', 'b', 'x']
                               == [('a', 1), ('b', 3), ('x', 0)]
length (countList xs ys) == length ys

A naive implementation would be:

countList xs = sort . map (id &&& length . (\ y -> filter (== y) xs))

This is O(n^2). However, since we have Ord a, this could be made a bit faster using a better strategy. We might sort both lists first, and then compare them in a "ladder-climbing" fasion.

For example, here are the two lists sorted. If I were doing it imperatively, I would use two pointers pointing towards the first element in each list:

       i
       |
xs = ['a', 'b', 'b', 'b', 'c']
ys = ['a', 'b', 'x']
       |
       j

Then increase i when xs !! i == ys !! j, meanwhile adding one to the counter at position j. When i encounters a new element, find it in ys by increasing j, and then repeat the previous step. This algorithm is O(n*log(n)).

But I can't find a way to achieve the same complexity in a purely functional way, nor did I find any existing function that can achieve what I want. How should I do this in Haskell?

like image 230
xzhu Avatar asked Feb 08 '23 20:02

xzhu


2 Answers

If the 2nd list does not have duplicates, and the 1st list is longer, you may avoid sorting the first list by using Data.Map. This will achieve an n1 log n2 complexity:

import Data.Map (fromList, toList, adjust)

countList :: Ord a => [a] -> [a] -> [(a, Int)]
countList l r = toList $ foldr (adjust (+1)) (fromList . zip r $ repeat 0) l
like image 138
behzad.nouri Avatar answered Feb 16 '23 17:02

behzad.nouri


I think this accomplishes what you're looking for:

import Data.List (sort)

countList :: Ord a => [a] -> [a] -> [(a, Int)]
countList l1 l2 = countList' (sort l1) (sort l2)
  where countList' _     []  = []
        countList' xs (y:ys) = let xs'   = dropWhile (<  y) xs
                                   (a,b) = span      (== y) xs'
                                in (y, length a) : countList' b ys

main = print $ countList ['a', 'b', 'c', 'b', 'b'] ['a', 'b', 'x']
like image 39
Sam van Herwaarden Avatar answered Feb 16 '23 17:02

Sam van Herwaarden