Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassava parsing error in haskell

Im trying to convert a csv into a vector using cassava. The csv Im trying to convert is the fischer iris data set, used for machine learning. It consists of four doubles and one string. My code is the following:

{-# LANGUAGE OverloadedStrings #-}

module Main where
import Data.Csv
import qualified Data.ByteString.Lazy as BS
import qualified Data.Vector as V

data Iris = Iris
  { sepal_length  :: !Double
  , sepal_width   :: !Double
  , petal_length  :: !Double
  , petal_width   :: !Double
  , iris_type     :: !String
 } deriving (Show, Eq, Read)

instance FromNamedRecord Iris where
  parseNamedRecord r =
    Iris
      <$> r .: "sepal_length"
      <*> r .: "sepal_width"
      <*> r .: "petal_length"
      <*> r .: "petal_width"
      <*> r .: "iris_type"

printIris :: Iris -> IO ()
printIris r  = putStrLn $  show (sepal_length r) ++ show (sepal_width r)
   ++ show(petal_length r) ++ show(petal_length r) ++ "hola"

main :: IO ()
main = do
  csvData <- BS.readFile "./iris/test-iris"
  print csvData
  case decodeByName csvData of
    Left err -> putStrLn err
    -- forM : O(n) Apply the monadic action to all elements of the vector,
    -- yielding a vector of results.
    Right (h, v) -> V.forM_ v $ printIris

When I run this, it seems as if the csvData is correctly formatted, the first lines from the print csvData return the following:

"5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris- setosa\n4.7,3.2,1.3,0.2,Iris-setosa\n4.6,3.1,1.5,0.2,Iris-setosa\n5.0,3.6,1.4,0.2,Iris-setosa\n5.4,3.9,1.7,0.4,Iris-setosa\n4.6,3.4,1.4,0.3,Iris-setosa\n5.0,3.4,1.5,0.2,Iris-setosa\n4.4,2.9,1.4,0.2,Iris-setosa\n4.9,3.1,1.5,0.1,Iris-setosa\n5.4,3.7,1.5,0.2,Iris-setosa\n4.8,3.4,1.6,0.2,Iris-setosa\n4.8,3.0,1.4,0.1,Iris-setosa\n4.3,3.0,1.1,0.1,Iris-setosa\n5.8,4.0,1.2,0.2,Iris-setosa\n5.7,4.4,1.5,0.4,Iris-set

But I get the following error:

parse error (Failed reading: conversion error: no field named "sepal_length")  at 
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4 (truncated)

Does anybody have any idea as to why I can be getting this error? The csv has no missing values, and if I replace the line which produces the error for another row I get the same error.

like image 229
nat Avatar asked Jul 02 '18 18:07

nat


1 Answers

It appears your data does not have a header, which is assumed by decodeByName

The data is assumed to be preceeded by a header.

Add a header, or use decode NoHeader and the FromRecord type class.

like image 107
Li-yao Xia Avatar answered Oct 15 '22 10:10

Li-yao Xia