Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple word count in haskell

Tags:

haskell

This is my FIRST haskell program! "wordCount" takes in a list of words and returns a tuple with with each case-insensitive word paired with its usage count. Any suggestions for improvement on either code readability or performance?

import List;
import Char;
uniqueCountIn ns xs = map (\x -> length (filter (==x) xs)) ns
nubl (xs) = nub (map (map toLower) xs) -- to lowercase
wordCount ws =  zip ns (uniqueCountIn ns ws)
   where ns = nubl ws
like image 355
Jonathan Dunlap Avatar asked Oct 18 '11 06:10

Jonathan Dunlap


3 Answers

Congrats on your first program!

For cleanliness: lose the semicolons. Use the new hierarchical module names instead (Data.List, Data.Char). Add type signatures. As you get more comfortable with function composition, eta contract your function definitions (remove rightmost arguments). e.g.

nubl :: [String] -> [String]
nubl = nub . map (map toLower)

If you want to be really rigorous, use explicit import lists:

import Data.List (nub) 
import Data.Char (toLower)

For performance: use a Data.Map to store the associations instead of nub and filter. In particular, see fromListWith and toList. Using those functions you can simplify your implementation and improve performance at the same time.

like image 114
luqui Avatar answered Nov 05 '22 04:11

luqui


One of the ways to improve readibility is to try to get used to the standard functions. Hoogle is one of the tools that sets Haskell apart from the rest of the world ;)

import Data.Char (toLower)
import Data.List (sort, group)
import Control.Arrow ((&&&))

wordCount :: String -> [(String, Int)]
wordCount = map (head &&& length) . group . sort . words . map toLower

EDIT: Explanation: So you think of it as a chain of mappings:

  • (map toLower) :: String -> String lowercases the entire text, for the purpose of case insensitivity
  • words :: String -> [String] splits a piece of text into words
  • sort :: Ord a => [a] -> [a] sorts
  • group :: Eq a => [a] -> [[a]] gathers identicial elements in a list, for example, group [1,1,2,3,3] -> [[1,1],[2],[3,3]]
  • &&& :: (a -> b) -> (a -> c) -> (a -> (b, c)) applies two functions on the same piece of data, then returns the tuple of results. For example: (head &&& length) ["word","word","word"] -> ("word", 3) (actually &&& is a little more general, but the simplified explanation works for this example)

EDIT: Or actually, look for the "multiset" package on Hackage.

like image 39
Phil Avatar answered Nov 05 '22 03:11

Phil


It is always good to ask more experienced developers for feedback. Nevertheless you could use hlint to get feedback on some small scale issues. It'll tell you about hierarchical imports, unncessary parenthesis, alternative higher-order functions, etc.

Regarding the function, nub1. If you don't follow luqui's advice to remove the parameter altogether yet, I would at least remove the parenthesis around xs on the right side of the equation.

like image 6
jmg Avatar answered Nov 05 '22 03:11

jmg