Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell csv-conduit in GHCi

Tags:

csv

haskell

I've been suggested csv-conduit as a good Haskell package to work with CSV files. I want to learn how it works, but the documentation is too terse for a newbie Haskell programmer.

Is there a way for me to figure out how it works by trial-and-error in GHCi?

More specifically, should I load modules and files from GHCi or should I write a simple HS file to load them and then move around interactively?


I mentioned csv-conduit, but I'm opened to using any CSV package. I just need to get my hands on one and fool around with it, until I feel at ease (much like I would do in IDLE).

like image 308
CHM Avatar asked Jul 18 '12 17:07

CHM


2 Answers

Take a look at the following function: readCSVFile :: :: (MonadResource m, CSV ByteString a) => CSVSettings -> FilePath -> m [a]

Its relatively simple to call, as we just need a CSVSettings, such as defCSVSettings, and a FilePath (aka String), "file.csv" or something.

Thus, after the call, we get (MonadResource m, CSV ByteString a). We can resolve this one at a time to figure out an appropriate type for this. We are performing IO in this operation, so for MonadResource m, m should just be ResourceT IO, which happens to be an instance of MonadBaseControl IO as required by runResourceT. This is a conduit specific thing.

For the CSV ByteString a, we need to find what instances of CSV. To do so, go to http://hackage.haskell.org/packages/archive/csv-conduit/0.2.1.1/doc/html/Data-CSV-Conduit.html#t:CSV (where the documentation for the package is in my opinion somewhat obnoxiously all stuffed into the typeclass...) and click on Instances to see what available instances we have of the form CSV ByteString a. The two options are CSV ByteString ByteString and CSV ByteString Text.

Of the two of these, Text is preferable because it handles unicode and CSV is unlikely to be containing binary data. ByteString is more or less similar to a [Word8] while Text is more similar to [Char] which is probably what you want. Hence, a should be Text (although ByteString will still work).

This means the result of the function call is ResourceT IO [Row Text]. We can't do much with this, but because ResourceT is a monad transformer, we can easily "pop" off the monad transformation layer with the function runResourceT. Thus,

readFile :: FilePath -> IO [Row Text]
readFile = runResourceT . readCSVFile defCSVSettings

which is easily usable within, say, main to get at the [Row Text] which you can then iterate over with a map or a fold to get your hands on the individual rows.

To run this sort of thing in GHCI you absolutely have to specifically point out the type. The reason is that the result class instance is not dependent on any of the parameters; thus, for any set of CSVSettings and FilePath, readCSVFile could return any number of different types as long as they as m is an instance of MonadResource m and a is an instance of CSV ByteString a. Thus, we have to explicitly point out to GHCi which type you want.

like image 64
alternative Avatar answered Nov 02 '22 08:11

alternative


Have you tried Text.CSV? It might be more appropriate if you're just starting out with Haskell, as it's much simpler. As for exploring new modules, you can just load it into GHCi, there's no need to write an additional file.

like image 32
ibab Avatar answered Nov 02 '22 08:11

ibab