Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Primitive but efficient grep clone in haskell?

Whenever I consider learning a new language -- haskell in this case -- I try to hack together a primitive grep clone to see how good the language implementation and/or its libraries are at text processing, because that's a major use case for me.

Inspired by code on the haskell wiki, I came up with the following naive attempt:

{-# LANGUAGE FlexibleContexts, ExistentialQuantification #-}

import Text.Regex.PCRE
import System.Environment

io :: ([String] -> [String]) -> IO ()
io f = interact (unlines . f . lines)

regexBool :: forall r l .
  (RegexMaker Regex CompOption ExecOption r,
   RegexLike Regex l) =>
  r -> l -> Bool
regexBool r l = l =~ r :: Bool

grep :: forall r l .
  (RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) =>
  r -> [l] -> [l]
grep r = filter (regexBool r)

main :: IO ()
main = do
  argv <- getArgs
  io $ grep $ argv !! 0

This appears to be doing what I want it to, but unfortunately, it's really slow -- about 10 times slower than a python script doing the same thing. I assume it's not the regex library that's at fault here, because it's calling into PCRE which should be plenty fast (switching to Text.Regex.Posix slows things down quite a bit further). So it must be the String implementation, which is instructive from a theoretical point of view but inefficient according to what I've read.

Is there an alternative to Strings in haskell that's both efficient and convenient (i.e. there's little or no friction when switching to using that instead of Strings) and that fully and correctly handles UTF-8-encoded Unicode, as well as other encodings without too much hassle if possible? Something that everybody uses when doing text processing in haskell but that I just don't know about because I'm a complete beginner?

like image 478
dlukes Avatar asked Jul 30 '16 15:07

dlukes


1 Answers

It's possible that the slow speed is caused by using the standard library's list type. I've often run into performance problems with it in the past.

It would be a good idea to profile your executable, to see where it spends its time: Tools for analyzing performance of a Haskell program. Profiling Haskell programs is really easy (compile with a switch and execute your program with an added argument, and the report is written to a text file in the current working directory).

As a side note, I use exactly the same approach as you when learning a new language: create something that works. My experience doing this with Haskell is that I can easily gain an order of magnitude or two in performance by profiling and making relatively simple changes (usually a couple of lines).

like image 156
runeks Avatar answered Sep 27 '22 23:09

runeks