Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse a text file (CSV) into haskell so I can operate on it?

Tags:

haskell

I have a flat text file with the following format:

ID|COUNT|Desc
1|100|Something
2|100|More
1|15|Whatever

I need to load this into Haskell so that I can perform some operations (in the case a GROUP-BY ID and SUM COUNT) and I am looking for ways to do it - one thing I cannot use any additional modules/packages (this is a school project - trying to figure it out with whatever is built-in).

I was doing some research and found Text.CSV as an option but can't really understand how it works (can't find any examples either - which is scary) - before I spend to much time there wondering if that is even the right approach - any suggestions, ideas, or examples would be much appreciated.

Keep me in mind that however it get stored I will have to process the data afterwards somehow.


I am trying this approach now:

main::IO()
main = do
       dbSales <- readFile "la.txt"
       let sales = lines dbSales
       (result, x, y) <- mapify sales
       print result

mapify :: [String] -> Map Int Int
mapify = Prelude.foldr (\s m -> let (id:count:desc) = (splitWhen (=='|') s)
                                    i = read id
                                    c = read count
                                 in insertWith (+) i c m) empty

However it complains about the line where I call mapify:

Couldn't match type `Map Int' with `IO'
Expected type: IO Int
  Actual type: Map Int Int

Trying with a new input file and not sure why but getting errors - if I use the following input:

ID1|ID2|DATE|SUM
0|0|07/13/2014/100
0|1|07/13/2014/101
0|2|07/13/2014/102
1|0|07/13/2014/100

And now instead I am trying to group on ID2 and SUM (instead od ID and COUNT from the previous example):

mapify :: [String] -> Map Int Int
mapify = Prelude.foldr (\s m -> let (id1:id2:date:sum) = (splitWhen (=='|') s)
                                    i = read id1
                                    j = read id2
                                    k = read date
                                    c = read sum
                                  in insertWith (+) j c m) empty

But no matter what I try I keep getting errors like this:

Couldn't match type `[Char]' with `Char'
Expected type: String
  Actual type: [[Char]]
In the first argument of `read', namely `sum'
In the expression: read sum
In an equation for `c': c = read sum
like image 446
JSchwartz Avatar asked Oct 20 '22 02:10

JSchwartz


1 Answers

mapify :: [String] -> Map Int Int
mapify = foldr (\s m -> let (id:count:desc) = (splitWhen (=='|') s)
                            i = read id :: Int
                            c = read count :: Int
                        in insertWith (+) i c m) empty

I think that should be pretty much what you want. It reads the first two values of each string into Ints, then insertWith adds the id to the map if it's not there, or increases the current count if it is. As it is it will crash with malformed data, so you might want to fix that, and it needs Data.List.Split and Data.Map

like image 94
genisage Avatar answered Oct 27 '22 01:10

genisage