Simple word count in haskell

Question

This is my FIRST haskell program! "wordCount" takes in a list of words and returns a tuple with with each case-insensitive word paired with its usage count. Any suggestions for improvement on either code readability or performance?

import List;
import Char;
uniqueCountIn ns xs = map (\x -> length (filter (==x) xs)) ns
nubl (xs) = nub (map (map toLower) xs) -- to lowercase
wordCount ws =  zip ns (uniqueCountIn ns ws)
   where ns = nubl ws

luqui · Accepted Answer

Congrats on your first program!

For cleanliness: lose the semicolons. Use the new hierarchical module names instead (Data.List, Data.Char). Add type signatures. As you get more comfortable with function composition, eta contract your function definitions (remove rightmost arguments). e.g.

nubl :: [String] -> [String]
nubl = nub . map (map toLower)

If you want to be really rigorous, use explicit import lists:

import Data.List (nub) 
import Data.Char (toLower)

For performance: use a Data.Map to store the associations instead of nub and filter. In particular, see fromListWith and toList. Using those functions you can simplify your implementation and improve performance at the same time.

Phil · Answer

One of the ways to improve readibility is to try to get used to the standard functions. Hoogle is one of the tools that sets Haskell apart from the rest of the world ;)

import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&))

wordCount :: String -> [(String, Int)]
wordCount = map (head &&& length) . group . sort . words . map toLower

EDIT: Explanation: So you think of it as a chain of mappings:

(map toLower) :: String -> String lowercases the entire text, for the purpose of case insensitivity
words :: String -> [String] splits a piece of text into words
sort :: Ord a => [a] -> [a] sorts
group :: Eq a => [a] -> [[a]] gathers identicial elements in a list, for example, group [1,1,2,3,3] -> [[1,1],[2],[3,3]]
&&& :: (a -> b) -> (a -> c) -> (a -> (b, c)) applies two functions on the same piece of data, then returns the tuple of results. For example: (head &&& length) ["word","word","word"] -> ("word", 3) (actually &&& is a little more general, but the simplified explanation works for this example)

EDIT: Or actually, look for the "multiset" package on Hackage.

jmg · Answer

It is always good to ask more experienced developers for feedback. Nevertheless you could use hlint to get feedback on some small scale issues. It'll tell you about hierarchical imports, unncessary parenthesis, alternative higher-order functions, etc.

Regarding the function, nub1. If you don't follow luqui's advice to remove the parameter altogether yet, I would at least remove the parenthesis around xs on the right side of the equation.

Simple word count in haskell

Tags:

haskell

Jonathan Dunlap

3 Answers

luqui

Phil

jmg

Recent Activity

Donate For Us

Simple word count in haskell

Tags:

haskell

Jonathan Dunlap

3 Answers

luqui

Phil

jmg

Related questions

Recent Activity

Donate For Us