I'm confused by the implementation of the 'nub' (select unique values) function in the Haskell standard library Data.List. The GHC implementation is
nub l = nub' l []
where
nub' [] _ = []
nub' (x:xs) ls
| x `elem` ls = nub' xs ls
| otherwise = x : nub' xs (x:ls)
As far as I can tell, this has a worst-case time complexity of O(n^2), since for a list of unique values it has to compare them all once to see that they are in fact unique.
If one used a hash table, the complexity could be reduced to O(n) for building the table + O(1) for checking each value against previous values in the hash table. Granted, this would not produce an ordered list but that would also be possible in O(n log n) using GHC's own ordered Data.Map, if that is necessary.
Why choose such an inefficient implementation for an important library function? I understand efficiency is not a main concern in Haskell but at least the standard library could make an effort to choose the (asymptotically) best data structure for the job.
Efficiency is quite a concern in Haskell, after all the language performs on par with Java, and beats it in terms of memory consumption, but of course it's not C.
The answer to your question is pretty simple: the Prelude's nub
requires only an Eq
constraint, while any implementation based on Map
or Set
would also require either an Ord
or Hashable
.
You're absolutely correct - nub
is an O(n^2) algorithm. However, there are still reasons why you might want to use it instead of using a hashmap:
nub
only requires the Eq
constraint; by comparison Data.Map
requires an Ord
constraint on keys and Data.HashMap
requires a key type with both Hashable
and Ord
type classesEdit: Slight correction on the third point -- you don't have to process the entire list to start getting results; you'll still have to examine every element of the input list (so nub
won't work on infinite lists), but you'll start returning results as soon as you find a unique element.
https://groups.google.com/forum/m/#!msg/haskell-cafe/4UJBbwVEacg/ieMzlWHUT_IJ
In my experience, "beginner" Haskell (including Prelude and the bad packages) simply ignores performance in many cases, in favor of simplicity.
Haskell performance is a complex problem to solve, so if you aren't experienced enough to search through Platform or Hackage for alternatives to the simple nub
(and especially if your input is in a List just because you haven't thought about alternative structures), then Data.List.nub
is likely not your only major performance problem and also you are probably writing code for a toy project where performance doesn't really matter.
You just have to have faith that when you get to building a large (in code or data) project, you will be more experienced and know how to set up your programs more efficiently.
In other words, don't worry about it, and assume that anything in Haskell 98 that comes from Prelude or base is likely to not be the most efficient way to solve a problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With