<h3>Problem</h3> <p>Let us suppose that we have a list <code>xs</code> (possibly a very big one), and we want to check that all its elements are the same.</p> <p>I came up with various ideas:</p> <h3>Solution 0</h3> <p>checking that all elements in <code>tail xs</code> are equal to <code>head xs</code>:</p> <pre class="prettyprint"><code>allTheSame :: (Eq a) => [a] -> Bool allTheSame xs = and $ map (== head xs) (tail xs) </code></pre> <h3>Solution 1</h3> <p>checking that <code>length xs</code> is equal to the length of the list obtained by taking elements from <code>xs</code> while they're equal to <code>head xs</code></p> <pre class="prettyprint"><code>allTheSame' :: (Eq a) => [a] -> Bool allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs) </code></pre> <h3>Solution 2</h3> <p>recursive solution: <code>allTheSame</code> returns <code>True</code> if the first two elements of <code>xs</code> are equal and <code>allTheSame</code> returns <code>True</code> on the rest of <code>xs</code></p> <pre class="prettyprint"><code>allTheSame'' :: (Eq a) => [a] -> Bool allTheSame'' xs | n == 0 = False | n == 1 = True | n == 2 = xs !! 0 == xs !! 1 | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs) where n = length xs </code></pre> <h3>Solution 3</h3> <p>divide and conquer:</p> <pre class="prettyprint"><code>allTheSame''' :: (Eq a) => [a] -> Bool allTheSame''' xs | n == 0 = False | n == 1 = True | n == 2 = xs !! 0 == xs !! 1 | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2 | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split) where n = length xs split = splitAt (n `div` 2) xs </code></pre> <h3>Solution 4</h3> <p>I just thought about this while writing this question:</p> <pre class="prettyprint"><code>allTheSame'''' :: (Eq a) => [a] -> Bool allTheSame'''' xs = all (== head xs) (tail xs) </code></pre> <h3>Questions</h3> <ol> <li><p>I think Solution 0 is not very efficient, at least in terms of memory, because <code>map</code> will construct another list before applying <code>and</code> to its elements. Am I right?</p></li> <li><p>Solution 1 is still not very efficient, at least in terms of memory, because <code>takeWhile</code> will again build an additional list. Am I right?</p></li> <li><p>Solution 2 is tail recursive (right?), and it should be pretty efficient, because it will return <code>False</code> as soon as <code>(xs !! 0 == xs !! 1)</code> is False. Am I right?</p></li> <li><p>Solution 3 should be the best one, because it complexity should be O(log n)</p></li> <li><p>Solution 4 looks quite Haskellish to me (is it?), but it's probably the same as Solution 0, because <code>all p = and . map p</code> (from Prelude.hs). Am I right?</p></li> <li><p>Are there other better ways of writing <code>allTheSame</code>? Now, I expect someone will answer this question telling me that there's a build-in function that does this: I've searched with hoogle and I haven't found it. Anyway, since I'm learning Haskell, I believe that this was a good exercise for me :)</p></li> </ol> <p>Any other comment is welcome. Thank you!</p>

<p>gatoatigrado's answer gives some nice advice for measuring the performance of various solutions. Here is a more symbolic answer.</p> <p>I think solution 0 (or, <em>exactly equivalently</em>, solution 4) will be the fastest. Remember that Haskell is <em>lazy</em>, so <code>map</code> will not have to construct the whole list before <code>and</code> is applied. A good way to build intuition about this is to play with infinity. So for example:</p> <pre class="prettyprint"><code>ghci> and $ map (< 1000) [1..] False </code></pre> <p>This asks whether all numbers are less than 1,000. If <code>map</code> constructed the entire list before <code>and</code> were applied, then this question could never be answered. The expression will still answer quickly even if you give the list a very large right endpoint (that is, Haskell is not doing any "magic" depending on whether a list is infinite).</p> <p>To start my example, let's use these definitions:</p> <pre class="prettyprint"><code>and [] = True and (x:xs) = x && and xs map f [] = [] map f (x:xs) = f x : map f xs True && x = x False && x = False </code></pre> <p>Here is the evaluation order for <code>allTheSame [7,7,7,7,8,7,7,7]</code>. There will be extra sharing that is too much of a pain to write down. I will also evaluate the <code>head</code> expression earlier than it would be for conciseness (it would have been evaluated anyway, so it's hardly different).</p> <pre class="prettyprint"><code>allTheSame [7,7,7,7,8,7,7,7] allTheSame (7:7:7:7:8:7:7:7:[]) and $ map (== head (7:7:7:7:8:7:7:7:[])) (tail (7:7:7:7:8:7:7:7:[])) and $ map (== 7) (tail (7:7:7:7:8:7:7:7:[])) and $ map (== 7) (7:7:7:8:7:7:7:[]) and $ (== 7) 7 : map (== 7) (7:7:8:7:7:7:[]) (== 7) 7 && and (map (== 7) (7:7:8:7:7:7:[])) True && and (map (== 7) (7:7:8:7:7:7:[])) and (map (== 7) (7:7:8:7:7:7:[])) (== 7) 7 && and (map (== 7) (7:8:7:7:7:[])) True && and (map (== 7) (7:8:7:7:7:[])) and (map (== 7) (7:8:7:7:7:[])) (== 7) 7 && and (map (== 7) (8:7:7:7:[])) True && and (map (== 7) (8:7:7:7:[])) and (map (== 7) (8:7:7:7:[])) (== 7) 8 && and (map (== 7) (7:7:7:[])) False && and (map (== 7) (7:7:7:[])) False </code></pre> <p>See how we didn't even have to check the last 3 7's? This is lazy evaluation making a list work more like a loop. All your other solutions use expensive functions like <code>length</code> (which have to walk all the way to the end of the list to give an answer), so they will be less efficient and also they will not work on infinite lists. Working on infinite lists and being efficient often go together in Haskell.</p>

efficiently checking that all the elements of a (big) list are the same

Problem

Let us suppose that we have a list xs (possibly a very big one), and we want to check that all its elements are the same.

I came up with various ideas:

Solution 0

checking that all elements in tail xs are equal to head xs:

allTheSame :: (Eq a) => [a] -> Bool allTheSame xs = and $ map (== head xs) (tail xs)

Solution 1

checking that length xs is equal to the length of the list obtained by taking elements from xs while they're equal to head xs

allTheSame' :: (Eq a) => [a] -> Bool allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)

Solution 2

recursive solution: allTheSame returns True if the first two elements of xs are equal and allTheSame returns True on the rest of xs

allTheSame'' :: (Eq a) => [a] -> Bool allTheSame'' xs   | n == 0 = False   | n == 1 = True   | n == 2 = xs !! 0 == xs !! 1   | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)     where  n = length xs

Solution 3

divide and conquer:

allTheSame''' :: (Eq a) => [a] -> Bool allTheSame''' xs   | n == 0 = False   | n == 1 = True   | n == 2 = xs !! 0 == xs !! 1   | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2   | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)     where n = length xs           split = splitAt (n `div` 2) xs

Solution 4

I just thought about this while writing this question:

allTheSame'''' :: (Eq a) => [a] -> Bool allTheSame'''' xs = all (== head xs) (tail xs)

Questions

I think Solution 0 is not very efficient, at least in terms of memory, because map will construct another list before applying and to its elements. Am I right?
Solution 1 is still not very efficient, at least in terms of memory, because takeWhile will again build an additional list. Am I right?
Solution 2 is tail recursive (right?), and it should be pretty efficient, because it will return False as soon as (xs !! 0 == xs !! 1) is False. Am I right?
Solution 3 should be the best one, because it complexity should be O(log n)
Solution 4 looks quite Haskellish to me (is it?), but it's probably the same as Solution 0, because all p = and . map p (from Prelude.hs). Am I right?
Are there other better ways of writing allTheSame? Now, I expect someone will answer this question telling me that there's a build-in function that does this: I've searched with hoogle and I haven't found it. Anyway, since I'm learning Haskell, I believe that this was a good exercise for me :)

Any other comment is welcome. Thank you!

990

asked May 25 '11 07:05

MarcoS

2 Answers

gatoatigrado's answer gives some nice advice for measuring the performance of various solutions. Here is a more symbolic answer.

I think solution 0 (or, exactly equivalently, solution 4) will be the fastest. Remember that Haskell is lazy, so map will not have to construct the whole list before and is applied. A good way to build intuition about this is to play with infinity. So for example:

ghci> and $ map (< 1000) [1..] False

This asks whether all numbers are less than 1,000. If map constructed the entire list before and were applied, then this question could never be answered. The expression will still answer quickly even if you give the list a very large right endpoint (that is, Haskell is not doing any "magic" depending on whether a list is infinite).

To start my example, let's use these definitions:

and [] = True and (x:xs) = x && and xs  map f [] = [] map f (x:xs) = f x : map f xs  True && x = x False && x = False

Here is the evaluation order for allTheSame [7,7,7,7,8,7,7,7]. There will be extra sharing that is too much of a pain to write down. I will also evaluate the head expression earlier than it would be for conciseness (it would have been evaluated anyway, so it's hardly different).

allTheSame [7,7,7,7,8,7,7,7] allTheSame (7:7:7:7:8:7:7:7:[]) and $ map (== head (7:7:7:7:8:7:7:7:[])) (tail (7:7:7:7:8:7:7:7:[])) and $ map (== 7)  (tail (7:7:7:7:8:7:7:7:[])) and $ map (== 7)          (7:7:7:8:7:7:7:[]) and $ (== 7) 7 : map (== 7) (7:7:8:7:7:7:[]) (== 7) 7 && and (map (== 7) (7:7:8:7:7:7:[])) True     && and (map (== 7) (7:7:8:7:7:7:[]))             and (map (== 7) (7:7:8:7:7:7:[])) (== 7) 7 && and (map (== 7)   (7:8:7:7:7:[])) True     && and (map (== 7)   (7:8:7:7:7:[]))             and (map (== 7)   (7:8:7:7:7:[])) (== 7) 7 && and (map (== 7)     (8:7:7:7:[])) True     && and (map (== 7)     (8:7:7:7:[]))             and (map (== 7)     (8:7:7:7:[])) (== 7) 8 && and (map (== 7)       (7:7:7:[])) False    && and (map (== 7)       (7:7:7:[])) False

See how we didn't even have to check the last 3 7's? This is lazy evaluation making a list work more like a loop. All your other solutions use expensive functions like length (which have to walk all the way to the end of the list to give an answer), so they will be less efficient and also they will not work on infinite lists. Working on infinite lists and being efficient often go together in Haskell.

175

answered Oct 14 '22 19:10

luqui

First of all, I don't think you want to be working with lists. A lot of your algorithms rely upon calculating the length, which is bad. You may want to consider the vector package, which will give you O(1) length compared to O(n) for a list. Vectors are also much more memory efficient, particularly if you can use Unboxed or Storable variants.

That being said, you really need to consider traversals and usage patterns in your code. Haskell's lists are very efficient if they can be generated on demand and consumed once. This means that you shouldn't hold on to references to a list. Something like this:

average xs = sum xs / length xs

requires that the entire list be retained in memory (by either sum or length) until both traversals are completed. If you can do your list traversal in one step, it'll be much more efficient.

Of course, you may need to retain the list anyway, such as to check if all the elements are equal, and if they aren't, do something else with the data. In this case, with lists of any size you're probably better off with a more compact data structure (e.g. vector).

Now that this is out of they way, here's a look at each of these functions. Where I show core, it was generated with ghc-7.0.3 -O -ddump-simpl. Also, don't bother judging Haskell code performance when compiled with -O0. Compile it with the flags you would actually use for production code, typically at least -O and maybe other options too.

Solution 0

allTheSame :: (Eq a) => [a] -> Bool allTheSame xs = and $ map (== head xs) (tail xs)

GHC produces this Core:

Test.allTheSame   :: forall a_abG. GHC.Classes.Eq a_abG => [a_abG] -> GHC.Bool.Bool [GblId,  Arity=2,  Str=DmdType LS,  Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,          ConLike=True, Cheap=True, Expandable=True,          Guidance=IF_ARGS [3 3] 16 0}] Test.allTheSame =   \ (@ a_awM)     ($dEq_awN :: GHC.Classes.Eq a_awM)     (xs_abH :: [a_awM]) ->     case xs_abH of _ {       [] ->         GHC.List.tail1         `cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool                 :: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);       : ds1_axJ xs1_axK ->         letrec {           go_sDv [Occ=LoopBreaker] :: [a_awM] -> GHC.Bool.Bool           [LclId, Arity=1, Str=DmdType S]           go_sDv =             \ (ds_azk :: [a_awM]) ->               case ds_azk of _ {                 [] -> GHC.Bool.True;                 : y_azp ys_azq ->                   case GHC.Classes.== @ a_awM $dEq_awN y_azp ds1_axJ of _ {                     GHC.Bool.False -> GHC.Bool.False; GHC.Bool.True -> go_sDv ys_azq                   }               }; } in         go_sDv xs1_axK     }

This looks pretty good, actually. It will produce an error with an empty list, but that's easily fixed. This is the case xs_abH of _ { [] ->. After this GHC performed a worker/wrapper transformation, the recursive worker function is the letrec { go_sDv binding. The worker examines its argument. If [], it's reached the end of the list and returns True. Otherwise it compares the head of the remaining to the first element and either returns False or checks the rest of the list.

Three other features.

The map was entirely fused away and doesn't allocate a temporary list.
Near the top of the definition notice the Cheap=True statement. This means GHC considers the function "cheap", and thus a candidate for inlining. At a call site, if a concrete argument type can be determined, GHC will probably inline allTheSame and produce a very tight inner loop, completely bypassing the Eq dictionary lookup.
The worker function is tail-recursive.

Verdict: Very strong contender.

Solution 1

allTheSame' :: (Eq a) => [a] -> Bool allTheSame' xs = (length xs) == (length $ takeWhile (== head xs) xs)

Even without looking at core I know this won't be as good. The list is traversed more than once, first by length xs then by length $ takeWhile. Not only do you have the extra overhead of multiple traversals, it means that the list must be retained in memory after the first traversal and can't be GC'd. For a big list, this is a serious problem.

Test.allTheSame'   :: forall a_abF. GHC.Classes.Eq a_abF => [a_abF] -> GHC.Bool.Bool [GblId,  Arity=2,  Str=DmdType LS,  Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,          ConLike=True, Cheap=True, Expandable=True,          Guidance=IF_ARGS [3 3] 20 0}] Test.allTheSame' =   \ (@ a_awF)     ($dEq_awG :: GHC.Classes.Eq a_awF)     (xs_abI :: [a_awF]) ->     case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->     case GHC.List.$wlen            @ a_awF            (GHC.List.takeWhile               @ a_awF               (let {                  ds_sDq :: a_awF                  [LclId, Str=DmdType]                  ds_sDq =                    case xs_abI of _ {                      [] -> GHC.List.badHead @ a_awF; : x_axk ds1_axl -> x_axk                    } } in                \ (ds1_dxa :: a_awF) ->                  GHC.Classes.== @ a_awF $dEq_awG ds1_dxa ds_sDq)               xs_abI)            0     of ww1_XCn { __DEFAULT ->     GHC.Prim.==# ww_aC6 ww1_XCn     }     }

Looking at the core doesn't tell much beyond that. However, note these lines:

case GHC.List.$wlen @ a_awF xs_abI 0 of ww_aC6 { __DEFAULT ->         case GHC.List.$wlen

This is where the list traversals happen. The first gets the length of the outer list and binds it to ww_aC6. The second gets the length of the inner list, but the binding doesn't happen until near the bottom, at

of ww1_XCn { __DEFAULT -> GHC.Prim.==# ww_aC6 ww1_XCn

The lengths (both Ints) can be unboxed and compared by a primop, but that's a small consolation after the overhead that's been introduced.

Verdict: Not good.

Solution 2

allTheSame'' :: (Eq a) => [a] -> Bool allTheSame'' xs   | n == 0 = False   | n == 1 = True   | n == 2 = xs !! 0 == xs !! 1   | otherwise = (xs !! 0 == xs !! 1) && (allTheSame'' $ snd $ splitAt 2 xs)     where  n = length xs

This has the same problem as solution 1. The list is traversed multiple times, and it can't be GC'd. It's worse here though, because now the length is calculated for each sub-list. I'd expect this to have the worst performance of all on lists of any significant size. Also, why are you special-casing lists of 1 and 2 elements when you're expecting the list to be big?

Verdict: Don't even think about it.

Solution 3

allTheSame''' :: (Eq a) => [a] -> Bool allTheSame''' xs   | n == 0 = False   | n == 1 = True   | n == 2 = xs !! 0 == xs !! 1   | n == 3 = xs !! 0 == xs !! 1 && xs !! 1 == xs !! 2   | otherwise = allTheSame''' (fst split) && allTheSame''' (snd split)     where n = length xs           split = splitAt (n `div` 2) xs

This has the same problem as Solution 2. Namely, the list is traversed multiple times by length. I'm not certain a divide-and-conquer approach is a good choice for this problem, it could end up taking longer than a simple scan. It would depend on the data though, and be worth testing.

Verdict: Maybe, if you used a different data structure.

Solution 4

allTheSame'''' :: (Eq a) => [a] -> Bool allTheSame'''' xs = all (== head xs) (tail xs)

This was basically my first thought. Let's check the core again.

Test.allTheSame''''   :: forall a_abC. GHC.Classes.Eq a_abC => [a_abC] -> GHC.Bool.Bool [GblId,  Arity=2,  Str=DmdType LS,  Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,          ConLike=True, Cheap=True, Expandable=True,          Guidance=IF_ARGS [3 3] 10 0}] Test.allTheSame'''' =   \ (@ a_am5)     ($dEq_am6 :: GHC.Classes.Eq a_am5)     (xs_alK :: [a_am5]) ->     case xs_alK of _ {       [] ->         GHC.List.tail1         `cast` (CoUnsafe (forall a1_axH. [a1_axH]) GHC.Bool.Bool                 :: (forall a1_axH. [a1_axH]) ~ GHC.Bool.Bool);       : ds1_axJ xs1_axK ->         GHC.List.all           @ a_am5           (\ (ds_dwU :: a_am5) ->              GHC.Classes.== @ a_am5 $dEq_am6 ds_dwU ds1_axJ)           xs1_axK     }

Ok, not too bad. Like solution 1, this will error on empty lists. The list traversal is hidden in GHC.List.all, but it will probably be expanded to good code at a call site.

Verdict: Another strong contender.

So between all of these, with lists I'd expect that Solutions 0 and 4 are the only ones worth using, and they are pretty much the same. I might consider Option 3 in some cases.

Edit: in both cases, the errors on empty lists can be simply fixed as in @augustss's answer.

The next step would be to do some time profiling with criterion.

answered Oct 14 '22 20:10

John L

Related questions
                            
                                How to check if a specific integer is in a list
                            
                                Can I count on order being preserved in a Python tuple?
                            
                                How do I get a list item by index in elm?
                            
                                Why can't you have a "List<List<String>>" in Java? [duplicate]
                            
                                Merging dictionary value lists in python
                            
                                readonly keyword does not make a List<> ReadOnly?
                            
                                How to shuffle a list in vim?
                            
                                Finding out if a list of Objects contains something with a specified field value?
                            
                                In Scala, is there a way to take convert two lists into a Map?
                            
                                Groovy convert from List to var args for method call
                            
                                How are Kotlin Array's toList and asList different?
                            
                                matching items from two lists (or arrays)
                            
                                Merge 3 arraylist to one
                            
                                C# Casting a List<ObjBase> as List<Obj>
                            
                                Warning ArrayList is a raw type. References to generic type ArrayList<E> should be parameterized [duplicate]
                            
                                pair lists to create tuples in order
                            
                                Declaring a List field with the final keyword
                            
                                Convert Scala List to List with another type
                            
                                How to center ordered list number in HTML
                            
                                Conditional counting in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With