Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you efficiently find a union of a list of lists of values in haskell?

Since a code example is worth a thousand words I'll start with that:

testList = [1,2,2,3,4,5]
testSet = map sumMapper $ tails testList
          where sumMapper [] = []
                sumMapper (a:b) = sumMap a b
                sumMap a b = map (+ a) b

This code takes a list and adds up all the elements to get the sum of all of them (I'd also be interested in efficiency of this). The output of testSet is:

[[3,3,4,5,6],[4,5,6,7],[5,6,7],[7,8],[9],[],[]]

I would like to find the union of these lists (to make it into a set) but I feel that:

whatIWant = foldl1 union testSet

will have bad performance (the real lists will be thousands of elements long).

Is this the correct solution or am I missing something obvious?

like image 382
Mike H-R Avatar asked Sep 25 '14 17:09

Mike H-R


1 Answers

You might want to try

nub $ concat theListOfLists

In the version using union, the code to cut out duplicates will get run many times. Here it only is run once.

It will only execute the code to pull out the unique values once.

There is also a Data.Set library, you could alternatively use

import Data.Set
S.fromList $ concat theListOfLists

The important point is that the code (here and above) that pulls out duplicates only gets run on the full list once, rather than over and over again.


edit- Rein mentions below that nub is O(n^2), so you should avoid the first solution above in favor of something O(n log n), as Data.Set.fromList should be. As others have mentioned in the comments, you need something that enforces Ord a to get the proper complexity O(n log n), and Data.Set does, nub does not.

I will leave the two solutions (poor performance and good performance) because I think the resulting discussion was useful.

like image 162
jamshidh Avatar answered Oct 27 '22 00:10

jamshidh