Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get only a particular type of elements from a list in Haskell?

Tags:

haskell

I'm studying Haskell Book, and in Chapter 10 (Folding Lists), I'm trying to solve an exercise regarding getting only one particular type of element from a list that includes different types of elements.

The authors give the following code:

import Data.Time

data DatabaseItem = DbString String
                  | DbNumber Integer
                  | DbDate   UTCTime
                  deriving (Eq, Ord, Show)

theDatabase :: [DatabaseItem]
theDatabase = [ DbDate (UTCTime
                        (fromGregorian 1911 5 1)
                        (secondsToDiffTime 34123))
              , DbNumber 9001
              , DbString "Hello, world!"
              , DbDate (UTCTime
                        (fromGregorian 1921 5 1)
                        (secondsToDiffTime 34123))
              ]

and the first question is:

Write a function that filters for DbDate values and returns a list of the UTCTime values inside them.

filterDbDate :: [DatabaseItem] -> [UTCTime]
filterDbDate = undefined

Since the chapter is all about folding lists, I assume it can be done using, for example, foldr.

My initial attempt has been to first write some helper functions and use them in a foldr, such as:

getDbDate1 :: DatabaseItem -> UTCTime
getDbDate1 (DbDate utcTime) = utcTime

isDbDate :: DatabaseItem -> Bool
isDbDate (DbDate _) = True
isDbDate _ = False

filterDbDate1 :: [DatabaseItem] -> [UTCTime]
filterDbDate1 database = foldr ((:) . getDbDate1) [] (filter isDbDate database)

This seems to do the job, because:

λ> filterDbDate1 theDatabase
[1911-05-01 09:28:43 UTC,1921-05-01 09:28:43 UTC]

But I'm not comfortable with this solution, because first of all, it gives the following warning:

/Users/emre/code/haskell/chapter10_folding_lists/database.hs:36:1: Warning: …
    Pattern match(es) are non-exhaustive
    In an equation for ‘getDbDate1’:
        Patterns not matched:
            DbString _
            DbNumber _

And I'm using two helper functions, one to help with filtering out the values that are not DbDate, another to get at the UTCTime component.

So, to get rid of non-exhaustive pattern matching warning and use a single helper function, I've decided to write it like:

getDbDate2 :: DatabaseItem -> Maybe UTCTime
getDbDate2 (DbDate utcTime) = Just utcTime
getDbDate2 _ = Nothing

filterDbDate2 :: [DatabaseItem] -> [UTCTime]
filterDbDate2 database = foldr ((:) . getDbDate2) [] database

But, of course, the above does not compile, because it does not type check, because, for example:

λ> foldr ((:) . getDbDate2) [] theDatabase
[Just 1911-05-01 09:28:43 UTC,Nothing,Nothing,Just 1921-05-01 09:28:43 UTC]

In other words, it can return a list of Just UTCTime values, together with Nothing values, and not only a list of UTCTime values.

So my question is: how can I write a (helper?) function that, in one go (so that I don't have to use the filter), checks if its value is DbNumber, and if so returns the UTCTime component? (And if not... it also has to return something (e.g. Nothing?), and this is where I'm having trouble, that is using Maybe UTCTime, and then getting Just UTCTime values, etc.)

like image 629
Emre Sevinç Avatar asked Aug 06 '16 14:08

Emre Sevinç


4 Answers

There are several other answers here covering good suggestions about other ways to think about the problem: using catMaybes to munge the data a second time after picking out Maybe UTCTimes; using list comprehensions and the handy syntax they have for filtering out non-matching patterns; using the monadic structure of lists to include or skip results; and writing a bespoke recursive function. In this answer, I will address your direct question, showing how to use the program structure you already have without completely rethinking your approach to list manipulations -- calling foldr with a helper function that does everything you need in one go.

First of all, I observe that all your existing attempts send foldr a function which unconditionally calls (:):

foldr ((:) . getDbDate1) [] (filter isDbDate database)
foldr ((:) . getDbDate2) [] database

The thing about this pattern is that this means the list you get out of the foldr will have the same length as the function you pass in -- since every (:) in the input list gets turned into a (:) in the output list. In your first solution, you handled this by removing some entries you didn't care about from the input list; in your second solution, you handled this by having extra uninteresting elements in your output list.

A third solution is to look at the list element before deciding whether to call (:) or not. Here is how one might do that:

conditionalCons :: DatabaseItem -> [UTCTime] -> [UTCTime]
conditionalCons (DbDate t) ts = t:ts
conditionalCons _          ts =   ts

Note in particular that in the second clause, we do not call (:) -- this filters out the non-matching elements of the list. We also have no concern about missing patterns. Now we can write

filterDbDate3 :: [DatabaseItem] -> [UTCTime]
filterDbDate3 = foldr conditionalCons []

Testing it in ghci:

> filterDbDate3 theDatabase
[1911-05-01 09:28:43 UTC,1921-05-01 09:28:43 UTC]

Perfect!

like image 110
Daniel Wagner Avatar answered Oct 22 '22 06:10

Daniel Wagner


A simple list comprehension will do

filterDbDate xs = [ x | DbDate x <- xs ]
like image 25
Franky Avatar answered Oct 22 '22 04:10

Franky


There are some good answers but I would like to add another approach how you can find solutions to such tasks.

First, write the simplest possible solution, i.e. a solution with direct recursion.

filterDbDate :: [DatabaseItem] -> [UTCTime]
filterDbDate ((DbDate time):items) = time:(filterDbDate items)
filterDbDate ( _           :items) =       filterDbDate items

This helps understand the structures involved and allows you to familiarize yourself with the actual steps needed. It's not the most performant version, but it's easy to write and it's often enough for the task at hand.

The next step would be to make the code more performant with tail recursion. It's a simple, almost mechanical transformation.

  1. Determine the accumulator type. This is often also the return type; in this case, a list. That gives you the new first lines

    filterDbDate :: [DatabaseItem] -> [UTCTime]
    filterDbDate = go []
      where ...
    
  2. Now take the original function and turn it into the internal go function by replacing every recursive call with the accumulator and then putting the result into a recursive call to go.

        go acc ((DbDate time):items) = go (time:acc) items
        go acc ( _           :items) = go       acc  items
    
  3. Add handling of the end case. Beware that the order of operations will be reversed.

        go acc  []                   = reverse acc
    
  4. Move the handling of the end case into the original call. This is not necessary if you want to stop here, but it helps on the way to a fold.

    filterDbDate = reverse . go []
      where 
        go acc  [] = acc
        ...
    

Now to turn that into a fold. The accumulator is the same the fold will be using and the transformation, again, is almost mechanical.

  1. Replace the call to go with a call to a fold.

    filterDbDate :: [DatabaseItem] -> [UTCTime]
    filterDbDate = reverse . foldl f []
    
  2. Turn go into f by removing the list-part of the pattern matches, the end case, and the recursive calls.

      where f acc (DbDate time) = time:acc
            f acc  _            =      acc
    
  3. Ponder if it would be better to reverse the direction of recursion.

    filterDbDate :: [DatabaseItem] -> [UTCTime]
    filterDbDate = foldr f []
      where f (DbDate time) = (time:)
            f _             = id
    

Now for final cleanup, extra brownie points and to enrage Haskell teachers make it as generic as you can without breaking stuff.

{-# LANGUAGE NoImplicitPrelude, GADTs #-}
import ClassyPrelude

filterDbDate :: ( MonoFoldable items, Element items ~ DatabaseItem
                , Monoid times, SemiSequence times, Element times ~ UTCTime)
             => items -> times
filterDbDate = foldr f mempty
   where f (DbDate time) = cons time
         f _             = id
like image 40
MarLinn Avatar answered Oct 22 '22 06:10

MarLinn


List is a monad. So we can use the functions of the Monad type-class.

utcTimes :: [UTCTime]
utcTimes =
  theDatabase >>=
  \ item ->
    case item of
      DbDate utcTime -> [utcTime]
      _ -> []

The (>>=) function here is the key. It is essentially the same thing as "flatMap" in other languages, if that rings a bell.

The following is the same as the above expressed in the Do-notation:

utcTimes :: [UTCTime]
utcTimes =
  do
    item <- theDatabase
    case item of
      DbDate utcTime -> [utcTime]
      _ -> []

In fact, we can even generalize this to a function, which will work for any monad over UTCTime (well, MonadPlus, really):

pickUTCTime :: MonadPlus m => DatabaseItem -> m UTCTime
pickUTCTime item =
  case item of
    DbDate utcTime -> return utcTime
    _ -> mzero

utcTimes :: [UTCTime]
utcTimes =
  theDatabase >>= pickUTCTime
like image 40
Nikita Volkov Avatar answered Oct 22 '22 04:10

Nikita Volkov