Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell groupBy depending on accumulator value

I have a list of pairs of views which represents list of content labels and their widths which I want to group in lines (if the next content label doesn't fit in line then put it into another line). So we have: viewList = [(View1, 45), (View2, 223.5), (View3, 14) (View4, 42)].

I want to write a function groupViews :: [a] -> [[a]] to group this list into a list of sublists where each sublist will contain only views with sum of widths less than the maximum specified width (let's say 250). So for a sorted viewList this function will return : [[(View3, 14), (View4, 42), (View1, 45)],[(View2, 223.5)]]

It looks similar to groupBy. However, groupBy doesn't maintain an accumulator. I tried to use scanl + takeWhile(<250) combination but in this case I was able to receive only first valid sublist. Maybe use iterate + scanl + takeWhile somehow? But this looks very cumbersome and not functional at all. Any help will be much appreciated.

like image 407
MainstreamDeveloper00 Avatar asked Oct 31 '17 23:10

MainstreamDeveloper00


3 Answers

I would start with a recursive definition like this:

groupViews :: Double -> (a -> Double) -> [a] -> [[a]]
groupViews maxWidth width = go (0, [[]])
  where
    go (current, acc : accs) (view : views)
      | current + width view <= maxWidth
      = go (current + width view, (view : acc) : accs) views
      | otherwise = go (width view, [view] : acc : accs) views
    go (_, accs) []
      = reverse $ map reverse accs

Invoked like groupViews 250 snd (sortOn snd viewList). The first thing I notice is that it can be represented as a left fold:

groupViews' maxWidth width
  = reverse . map reverse . snd . foldl' go (0, [[]])
  where
    go (current, acc : accs) view
      | current + width view <= maxWidth
      = (current + width view, (view : acc) : accs)
      | otherwise
      = (width view, [view] : acc : accs)

I think this is fine, though you could factor it further if you like, into one scan to accumulate the widths modulo the max width, and another pass to group the elements into ascending runs. For example, here’s a version that works on integer widths:

groupViews'' maxWidth width views
  = map fst
  $ groupBy ((<) `on` snd)
  $ zip views
  $ drop 1
  $ scanl (\ current view -> (current + width view) `mod` maxWidth) 0 views

And of course you can include the sort in these definitions instead of passing the sorted list from outside.

like image 179
Jon Purdy Avatar answered Sep 30 '22 08:09

Jon Purdy


I don't know a clever way to do this just by combining functions from the standard library, but I do think you can do better than just implementing it from scratch.

This problem fits into a class of problems that I've seen before: "batch up items from this list somehow, and combine its items into batches according to some combination rule and some rule for deciding when a batch is too big". Years ago, when I was writing Clojure, I built a function that abstracted out this idea of batched combinations, just asking you to specify the rules for batching, and was able to use it in a surprising number of places.

Here's how I think it might be reimagined in Haskell:

glue :: Monoid a => (a -> Bool) -> [a] -> [a]
glue tooBig = go mempty
  where go current [] = [current]
        go current (x:xs) | tooBig x' = current : go x xs
                          | otherwise = go x' xs
          where x' = current `mappend` x

If you had such a glue function already, you could build a simple data type with the appropriate Monoid instance (a list of objects and their cumulative sum), and then let glue do the heavy lifting:

import Data.Monoid (Sum(..))

data ViewGroup contents size = ViewGroup {totalSize :: size,
                                          elements :: [(contents, size)]}

instance Monoid b => Monoid (ViewGroup a b) where
  mempty = ViewGroup mempty []
  mappend (ViewGroup lSize lElts) (ViewGroup rSize rElts) = 
    ViewGroup (lSize `mappend` rSize) 
              (lElts ++ rElts)

viewGroups = let views = [("a", 14), ("b", 42), ("c", 45), ("d", 223.5)]
             in glue ((> 250) . totalSize) [ViewGroup (Sum width) [(x, Sum width)] 
                                           | (x, width) <- views]

main = print (viewGroups :: [ViewGroup String (Sum Double)])

[ViewGroup {totalSize = Sum {getSum = 101.0}, 
            elements = [("a",Sum {getSum = 14.0}),
                        ("b",Sum {getSum = 42.0}),
                        ("c",Sum {getSum = 45.0})]},
ViewGroup {totalSize = Sum {getSum = 223.5}, 
           elements = [("d",Sum {getSum = 223.5})]}]

On the one hand this looks like quite a bit of work for a simple function, but on the other it's rather nice to have a type that describes the cumulative summing you're doing, and Monoid instances are nice to have anyway...and after defining the type and the Monoid instance there's almost no work left to do in the calling of glue itself.

Well, I don't know, maybe it's still too much work, especially if you don't believe you can reuse that type. But I do think it's useful to recognize that this is a specific case of a more general problem, and try to solve the more general problem as well.

like image 41
amalloy Avatar answered Sep 30 '22 06:09

amalloy


Given that groupBy and span themselves are defined by manual recursive functions, our modified functions will use the same mechanism.

Let us first define a general function groupAcc which takes an initial value for the accumulator, and then a function which takes an element in the list, the current accumulator state and potentially produces a new accumulated value (Nothing means the element is not accepted):

{-# LANGUAGE LambdaCase #-}

import Data.List (sortOn)
import Control.Arrow (first, second)

spanAcc :: z -> (a -> z -> Maybe z) -> [a] -> ((z, [a]), [a])
spanAcc z0 p = \case
  xs@[]      -> ((z0, xs), xs)
  xs@(x:xs') -> case p x z0 of
    Nothing  -> ((z0, []), xs)
    Just z1  -> first (\(z2, xt) -> (if null xt then z1 else z2, x : xt)) $
                spanAcc z1 p xs'

groupAcc :: z -> (a -> z -> Maybe z) -> [a] -> [(z, [a])]
groupAcc z p = \case
  [] -> [] ;
  xs -> uncurry (:) $ second (groupAcc z p) $ spanAcc z p xs

For our specific problem, we define:

threshold :: (Num a, Ord a) => a -> a -> a -> Maybe a
threshold max a z0 = let z1 = a + z0 in if z1 < max then Just z1 else Nothing

groupViews :: (Ord z, Num z) => [(lab, z)] -> [[(lab, z)]]
groupViews = fmap snd . groupAcc 0 (threshold 250 . snd)

Which finally gives us:

groupFinal :: (Num a, Ord a) => [(lab, a)] -> [[(lab, a)]]
groupFinal = groupViews . sortOn snd

And ghci gives us:

> groupFinal [("a", 45), ("b", 223.5), ("c", 14), ("d", 42)]
[[("c",14.0),("d",42.0),("a",45.0)],[("b",223.5)]]

If we want to, we can simplify groupAcc by assuming that z is a Monoid wherefore mempty may be used, such that:

groupAcc2 :: Monoid z => (a -> z -> Maybe z) -> [a] -> [(z, [a])]
groupAcc2 p = \case
  [] -> [] ;
  xs -> let z = mempty in
        uncurry (:) $ second (groupAcc z p) $ spanAcc z p xs
like image 22
Centril Avatar answered Sep 30 '22 08:09

Centril