Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading long data structure in Haskell

Tags:

haskell

I have to read a data structure from a text file (space separated), one data item per line. My first tentative would be

data Person = Person {name :: String, surname :: String, age :: Int, ... dozens of other fields} deriving (Show,...)

main = do
  string <- readFile "filename.txt"
  let people = readPeople string
  do_something people

readPeople s = map (readPerson.words) (lines s)

readPerson row = Person (read(row!!0)) (read(row!!1)) (read(row!!2)) (read(row!!3)) ... (read(row!!dozens))

This code works, but the code for readPerson is terrible: I have to copy-paste the read(row!!n)) for all fields in my data structure!

So, as a second attempt, I think that I might exploit Currying of the Person function, and pass it the arguments one at the time.

Uhm, there must be something in Hoogle, but I cannot figure out the type signature ... Never mind, it looks simple enough and I can write it myself:

readPerson row = readFields Person row

readFields f [x] = (f x)
readFields f (x:xs) = readFields (f (read x)) xs

Ahh, looks much better coding style!

But, it does not compile! Occurs check: cannot construct the infinite type: t ~ String -> t

Indeed, the function f I am passing to readFields has a different type signature in each invocation; that's why I could not figure its type signature ...

So, my question is: what is the simplest and elegant way to read a data structure with many fields?

like image 768
Archangel Avatar asked Jul 11 '16 09:07

Archangel


1 Answers

First, it's always a good practice to include types for all top-level declaration. It makes code better structured and much more readable.

One simple way how to achieve this is to take advantage of applicative functors. During parsing, you have an "effectful" computation where the effect is consuming part of the input and its result is one parsed piece. We can use the State monad to track the remaining input, and create a polymorphic function that consumes one element of the input and reads it:

import Control.Applicative
import Control.Monad.State

data Person = Person { name :: String, surname :: String, age :: Int }
    deriving (Eq, Ord, Show, Read)

readField :: (Read a) => State [String] a
readField = state $ \(x : xs) -> (read x, xs)

And in order to parse many such fields we use the <$> and <*> combinators which allow to sequence operations as follows:

readPerson :: [String] -> Person
readPerson = evalState $ Person <$> readField <*> readField <*> readField

Expression Person <$> ... is of type State [String] Person and we run evalState on given input to run the stateful computation and extract the output. We still need to have the same number of readField as many times as there are fields, but without having to use indices or explicit types.

For a real program you'd probably include some error handling, as read fails with an exception, as well as the patterm (x : xs) if the input list is too short. Using a full-fledged parser such as parsec or attoparsec allows you to use the same notation and to have proper error handling, customize parsing of individual fields etc.


Even more universal way is to automate wrapping and unwrapping fields into lists using generics. Then you just derive Generic. If you're interested, I can give an example.

Or, you could use an existing serialization package, either a binary one like cereal or binary, or a text-based one such as aeson or yaml, which usually allow you to do both (either automatically derive (de)serialization from Generic or provide your custom one).

like image 63
Petr Avatar answered Oct 23 '22 05:10

Petr