I have to read a data structure from a text file (space separated), one data item per line. My first tentative would be
data Person = Person {name :: String, surname :: String, age :: Int, ... dozens of other fields} deriving (Show,...)
main = do
string <- readFile "filename.txt"
let people = readPeople string
do_something people
readPeople s = map (readPerson.words) (lines s)
readPerson row = Person (read(row!!0)) (read(row!!1)) (read(row!!2)) (read(row!!3)) ... (read(row!!dozens))
This code works, but the code for readPerson
is terrible: I have to copy-paste the read(row!!n))
for all fields in my data structure!
So, as a second attempt, I think that I might exploit Currying of the Person
function, and pass it the arguments one at the time.
Uhm, there must be something in Hoogle, but I cannot figure out the type signature ... Never mind, it looks simple enough and I can write it myself:
readPerson row = readFields Person row
readFields f [x] = (f x)
readFields f (x:xs) = readFields (f (read x)) xs
Ahh, looks much better coding style!
But, it does not compile! Occurs check: cannot construct the infinite type: t ~ String -> t
Indeed, the function f
I am passing to readFields
has a different type signature in each invocation; that's why I could not figure its type signature ...
So, my question is: what is the simplest and elegant way to read a data structure with many fields?
First, it's always a good practice to include types for all top-level declaration. It makes code better structured and much more readable.
One simple way how to achieve this is to take advantage of applicative functors. During parsing, you have an "effectful" computation where the effect is consuming part of the input and its result is one parsed piece. We can use the State
monad to track the remaining input, and create a polymorphic function that consumes one element of the input and read
s it:
import Control.Applicative
import Control.Monad.State
data Person = Person { name :: String, surname :: String, age :: Int }
deriving (Eq, Ord, Show, Read)
readField :: (Read a) => State [String] a
readField = state $ \(x : xs) -> (read x, xs)
And in order to parse many such fields we use the <$>
and <*>
combinators which allow to sequence operations as follows:
readPerson :: [String] -> Person
readPerson = evalState $ Person <$> readField <*> readField <*> readField
Expression Person <$> ...
is of type State [String] Person
and we run evalState
on given input to run the stateful computation and extract the output. We still need to have the same number of readField
as many times as there are fields, but without having to use indices or explicit types.
For a real program you'd probably include some error handling, as read
fails with an exception, as well as the patterm (x : xs)
if the input list is too short. Using a full-fledged parser such as parsec or attoparsec allows you to use the same notation and to have proper error handling, customize parsing of individual fields etc.
Even more universal way is to automate wrapping and unwrapping fields into lists using generics. Then you just derive Generic
. If you're interested, I can give an example.
Or, you could use an existing serialization package, either a binary one like cereal or binary, or a text-based one such as aeson or yaml, which usually allow you to do both (either automatically derive (de)serialization from Generic
or provide your custom one).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With