What approch for simple text processing in Haskell?

Question

I am trying to do some simple text processing in Haskell, and I am wondering what might me the best way to go about this in an FP language. I looked at the parsec module, but this seems much more sophisticated than I am looking for as a new Haskeller. What would be the best way to strip all the punctuation from a corpus of text? My naive approach was to make a function like this:

removePunc str = [c | c <- str, c /= '.',
                                 c /= '?',
                                 c /= '.',
                                 c /= '!',
                                 c /= '-',
                                 c /= ';',
                                 c /= '\'',
                                 c /= '\"',]

huon · Accepted Answer

A possibly more efficient method (O(log n) rather than O(n)), is to use a Set (from Data.Set):

import qualified Data.Set as S

punctuation = S.fromList ",?,-;'\""

removePunc = filter (`S.notMember` punctuation)

You must construct the set outside the function, so that it is only computed once (by being shared across all calls), since the overhead of creating the set is much larger than the simple linear-time notElem test others have suggested.

Note: this is such a small situation that the extra overhead of a Set might outweight the asymptotic benefits of the set versus the list, so if one is looking for absolute performance this must be profiled.

Ronson · Answer

You can simply write your code:

removePunc = filter (`notElem` ".?!-;\'\"")

or

removePunc = filter (flip notElem ".?!-;\'\"")

Daniel · Answer

You can group your characters in a String and use notElem:

[c | c <- str, c `notElem` ".?!,-;"]

or in a more functional style:

filter (\c -> c `notElem` ".?!,") str

What approch for simple text processing in Haskell?

Tags:

haskell

nlp

turtle

3 Answers

huon

Ronson

Daniel

Recent Activity

Donate For Us

What approch for simple text processing in Haskell?

Tags:

haskell

nlp

turtle

3 Answers

huon

Ronson

Daniel

Related questions

Recent Activity

Donate For Us