Haskell function nub inefficient

Question

I'm confused by the implementation of the 'nub' (select unique values) function in the Haskell standard library Data.List. The GHC implementation is

nub l                   = nub' l []
  where
    nub' [] _           = []
    nub' (x:xs) ls
        | x `elem` ls   = nub' xs ls
        | otherwise     = x : nub' xs (x:ls)

As far as I can tell, this has a worst-case time complexity of O(n^2), since for a list of unique values it has to compare them all once to see that they are in fact unique.

If one used a hash table, the complexity could be reduced to O(n) for building the table + O(1) for checking each value against previous values in the hash table. Granted, this would not produce an ordered list but that would also be possible in O(n log n) using GHC's own ordered Data.Map, if that is necessary.

Why choose such an inefficient implementation for an important library function? I understand efficiency is not a main concern in Haskell but at least the standard library could make an effort to choose the (asymptotically) best data structure for the job.

Nikita Volkov · Accepted Answer

Efficiency is quite a concern in Haskell, after all the language performs on par with Java, and beats it in terms of memory consumption, but of course it's not C.

The answer to your question is pretty simple: the Prelude's nub requires only an Eq constraint, while any implementation based on Map or Set would also require either an Ord or Hashable.

ErikR · Answer

You're absolutely correct - nub is an O(n^2) algorithm. However, there are still reasons why you might want to use it instead of using a hashmap:

for small lists it still might be faster
nub only requires the Eq constraint; by comparison Data.Map requires an Ord constraint on keys and Data.HashMap requires a key type with both Hashable and Ord type classes
it's lazy - you don't have to run through the entire input list to start getting results

Edit: Slight correction on the third point -- you don't have to process the entire list to start getting results; you'll still have to examine every element of the input list (so nub won't work on infinite lists), but you'll start returning results as soon as you find a unique element.

misterbee · Answer

https://groups.google.com/forum/m/#!msg/haskell-cafe/4UJBbwVEacg/ieMzlWHUT_IJ

In my experience, "beginner" Haskell (including Prelude and the bad packages) simply ignores performance in many cases, in favor of simplicity.

Haskell performance is a complex problem to solve, so if you aren't experienced enough to search through Platform or Hackage for alternatives to the simple nub (and especially if your input is in a List just because you haven't thought about alternative structures), then Data.List.nub is likely not your only major performance problem and also you are probably writing code for a toy project where performance doesn't really matter.

You just have to have faith that when you get to building a large (in code or data) project, you will be more experienced and know how to set up your programs more efficiently.

In other words, don't worry about it, and assume that anything in Haskell 98 that comes from Prelude or base is likely to not be the most efficient way to solve a problem.

Haskell function nub inefficient

Tags:

performance

algorithm

haskell

jforberg

3 Answers

Nikita Volkov

ErikR

misterbee

Recent Activity

Donate For Us

Haskell function nub inefficient

Tags:

performance

algorithm

haskell

jforberg

3 Answers

Nikita Volkov

ErikR

misterbee

Related questions

Recent Activity

Donate For Us