Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

groupBy-like function such that the binary predicate holds between consecutive elements of each group instead of any two

On Hackage I see that groupBy's implementation is this:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

which means that the predicate eq holds between any two elements of each group. Examples:

> difference_eq_1 = ((==1).) . flip (-)
> first_isnt_newline = ((/= '\n').) . const
>
> Data.List.groupBy difference_eq_1 ([1..10] ++ [11,13..21])
[[1,2],[3,4],[5,6],[7,8],[9,10],[11],[13],[15],[17],[19],[21]]
>
> Data.List.groupBy first_isnt_newline "uno\ndue\ntre"
["uno\ndue\ntre"]

What if instead I want to group elements such that the predicate holds between any pair of consecutive elements, so that the above results would be as follows?

[[1,2,3,4,5,6,7,8,9,10,11],[13],[15],[17],[19],[21]]
["uno\n","due\n","tre"]

I wrote it myself, and it looks a bit ugly

groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' p = foldr step []
  where step elem [] = [[elem]]
        step elem gs'@((g'@(prev:g)):gs)
          | elem `p` prev = (elem:g'):gs
          | otherwise = [elem]:gs'

So I was wandering if such a function exists already and I just don't find it.

As regards the second usage, Data.List.groupBy first_isnt_newline, where the binary predicate basically ignores the second argument and applies a unary predicate to the first, I've just found that Data.List.HT.segmentAfter unary_predicate does the job, where unary_predicate is the negation of the unary predicate in which const's output is forwarded. In other words Data.List.groupBy ((/= '\n').) . const === Data.List.HT.segmentAfter (=='\n').

like image 822
Enlico Avatar asked Oct 07 '20 16:10

Enlico


1 Answers

There is a groupBy package that does exactly that.

But here’s another way of implementing it:

  • Zip the list with its tail to test the predicate on adjacent elements

  • Generate a “group index” by scanning the result and incrementing the group whenever the predicate is false

  • Group by the index

  • Remove the indices

groupByAdjacent :: (a -> a -> Bool) -> [a] -> [[a]]
groupByAdjacent p xs
  = fmap (fmap fst)
  $ groupBy ((==) `on` snd)
  $ zip xs
  $ scanl' (\ g (a, b) -> if p a b then g else succ g) 0
  $ zip xs
  $ drop 1 xs

For an input like [1, 2, 3, 10, 11, 20, 30], the predicate will return [True, True, False, True, False, False] and the resulting group indices will be [0, 0, 0, 1, 1, 2, 3].

The scan can also be written pointfree as scanr (bool succ id . uncurry p) 0, since the scan direction doesn’t matter (although the group indices will be reversed). The group index might be handy or just more readable to keep as an integer, but it could be a Bool instead, because the minimum size of a group is 1: the functional argument of the scan would be bool not id . uncurry p, which can be simplified to (==) . uncurry p. And several of these parts could be factored into reusable functions, like zipNext = zip <*> drop 1, but I’ve inlined them for simplicity’s sake.

like image 77
Jon Purdy Avatar answered Oct 28 '22 13:10

Jon Purdy