I'm currently working with CSV files where I'm parsing them into a [[String]] The first [String] in that array is a header file eg:
["Code","Address","Town"]
and the rest are arrays of information
["ABA","12,east road", "London"]
I would like to create a query system where input and the result will look something like this
>count "Town"="*London*" @1="A*"
14 rows
The column name could be put in as a string Or as a @ with the index of the column I have a case switch to recognise the first word input since Im going to expand my CSV reader for different functions. When It sees the word count it will go to a function that will return a count of rows. Im not sure how to start doing the parsing of the query. At first I thought I might split the resulting string after the word count into a list of strings with each query, perform one and use the list that satisfied this query to be checked again for the next, leaving with a list for which all queries are satisfied, then counting amount of entries and returning them. There would be a case switch also to recognise if the first input is a string or an @ symbol. The * are used to represent zero or any character following the word. I am not sure how to start implementing this or if im missing a problem I might encounter with my solution. I will be great full for any help with starting me off. Im not very advanced with Haskell(since Im just starting), so I would also appreciate keeping it simple. Thank you
Here's one possible approach.
First, let us move away from your list-of-list-of-string representation a bit and let us represents records as key/value pairs, such that a database is just a list of records:
type Field = (String, String) -- key, value
type Record = [Field]
type Db = [Record]
Reading in CSV data in your representation then becomes:
type Csv = [[String]]
fromCsv :: Csv -> Db
fromCsv [] = []
fromCsv (ks : vss) = map (zip ks) vss
Now, let us talk about queries. In your setting, a query is essentially a list of filters, where each filter identifies a field and matches a set of values:
type Query = [Filter]
type Filter = (Selector, ValueFilter)
Fields are selected either by name or by a one-based (!) index:
data Selector = FieldName String | FieldIndex Int
Values are matched by applying a sequence of simple parsers, where a parser either recognises a single character or otherwise a sequence of zero or more arbitrary characters:
type ValueFilter = [Parser]
data Parser = Char Char | Wildcard
Parsing can be implemented using the list-of-successes method, where each success denotes the remaining input, i.e., the part of the input that was not consumed by the parser. An empty list of remaining inputs denotes failure. (So, note the difference between [] and [[]] in the produced results in the cases below.)
parse :: Parser -> String -> [String]
parse (Char c) (c' : cs') | c == c' = [cs']
parse Wildcard [] = [[]]
parse Wildcard cs@(_ : cs') = cs : parse Wildcard cs'
parse _ _ = []
Filtering values then develops into backtracking:
filterValue :: ValueFilter -> String -> Bool
filterValue ps cs = any null (go ps cs)
where
go [] cs = [cs]
go (p : ps) cs = concatMap (go ps) (parse p cs)
Value selection is straightforward:
select :: Selector -> Record -> Maybe String
select (FieldName s) r = lookup s r
select (FieldIndex n) r | n > 0 && n <= length r = Just (snd (r !! (n - 1)))
| otherwise = Nothing
Applying a record filter now amounts to constructing a predicate over records:
apply :: Filter -> Record -> Bool
apply (s, vf) r = case select s r of
Nothing -> False
Just v -> filterValue vf v
Finally, for executing a complete query, we have
exec :: Query -> Db -> [Record]
exec = (flip . foldl . flip) (filter . apply)
(I leave the parsing of queries themselves as an exercise:
readQuery :: String -> Maybe Query
readQuery = ...
but I recommend using a parser-combinator library such as parsec or uulib.)
Now, let's test. First, we introduce a small database in CSV-format:
csv :: Csv
csv =
[ ["Name" , "City" ]
------- ------------
, ["Will" , "London" ]
, ["John" , "London" ]
, ["Chris", "Manchester"]
, ["Colin", "Liverpool" ]
, ["Nick" , "London" ]
]
Then, we construct a simple query:
-- "Name"="*i*" @2="London"
query :: Query
query =
[ (FieldName "Name", [Wildcard, Char 'i', Wildcard])
, (FieldIndex 2,
[Char 'L', Char 'o', Char 'n', Char 'd', Char 'o', Char 'n'])
]
And, indeed, running our query against the database yields:
> exec query (fromCsv csv)
[[("Name","Will"),("City","London")],[("Name","Nick"),("City","London")]]
Or, if you are only after counting the results of your query:
> length $ exec query (fromCsv csv)
2
Of course, this is just one approach and for sure one can think of many alternatives. A nice aspect of breaking the problem down in small functions, as we have done above, is that you can easily test and experiment with small chunks of the solution in isolation.
I am not that much proficient in Haskell either...but I would approach it this way: what you want is essentially:
f $ filter g list
Where 'f' can be something like 'count' (that would be actually length), and 'g' is the filtering function correspondign to your query. First, you would split the input into 'head' and 'tail' (that would be the list); then you could use Parsec to parse the query. Your parsec parser would simply return a tuple; first would be a function 'f' (that could be 'length' if it encounters 'count'); the second would simply return true/false; you would have these types:
f :: [String] -> Int
g :: [String] -> Bool
Building the 'f' and 'g' is quite easy with parsec. I think if you play a little with the examples on the linked page, you'll figure it out yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With