This is my FIRST haskell program! "wordCount" takes in a list of words and returns a tuple with with each case-insensitive word paired with its usage count. Any suggestions for improvement on either code readability or performance?
import List;
import Char;
uniqueCountIn ns xs = map (\x -> length (filter (==x) xs)) ns
nubl (xs) = nub (map (map toLower) xs) -- to lowercase
wordCount ws = zip ns (uniqueCountIn ns ws)
where ns = nubl ws
Congrats on your first program!
For cleanliness: lose the semicolons. Use the new hierarchical module names instead (Data.List
, Data.Char
). Add type signatures. As you get more comfortable with function composition, eta contract your function definitions (remove rightmost arguments). e.g.
nubl :: [String] -> [String]
nubl = nub . map (map toLower)
If you want to be really rigorous, use explicit import lists:
import Data.List (nub)
import Data.Char (toLower)
For performance: use a Data.Map
to store the associations instead of nub
and filter
. In particular, see fromListWith
and toList
. Using those functions you can simplify your implementation and improve performance at the same time.
One of the ways to improve readibility is to try to get used to the standard functions. Hoogle is one of the tools that sets Haskell apart from the rest of the world ;)
import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&))
wordCount :: String -> [(String, Int)]
wordCount = map (head &&& length) . group . sort . words . map toLower
EDIT: Explanation: So you think of it as a chain of mappings:
(map toLower) :: String -> String
lowercases the entire text, for the purpose of case
insensitivitywords :: String -> [String]
splits a piece of text into wordssort :: Ord a => [a] -> [a]
sortsgroup :: Eq a => [a] -> [[a]]
gathers identicial elements in a list, for example, group
[1,1,2,3,3]
-> [[1,1],[2],[3,3]]
&&& :: (a -> b) -> (a -> c) -> (a -> (b, c))
applies two functions on the same piece of data, then returns
the tuple of results. For example: (head &&& length) ["word","word","word"]
-> ("word", 3)
(actually &&&
is a little more general, but the simplified explanation works for this example)EDIT: Or actually, look for the "multiset" package on Hackage.
It is always good to ask more experienced developers for feedback. Nevertheless you could use hlint to get feedback on some small scale issues. It'll tell you about hierarchical imports, unncessary parenthesis, alternative higher-order functions, etc.
Regarding the function, nub1
. If you don't follow luqui's advice to remove the parameter altogether yet, I would at least remove the parenthesis around xs
on the right side of the equation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With