findSubstrings and breakSubstring in Data.ByteString

Question

In the source of Data/ByteString.hs it says that the function findSubstrings has been deprecated in favor of breakSubstring. However I think the findSubstrings which was implemented using the KMP algorithm is much more efficient than the algorithm used in breakSubstring which is a naive one. Anybody has any idea why this has been done ?

Here's the old implementation:

{-# DEPRECATED findSubstrings "findSubstrings is deprecated in favour of breakSubstring." #-}

{-
{- This function uses the Knuth-Morris-Pratt string matching algorithm.  -}

findSubstrings pat@(PS _ _ m) str@(PS _ _ n) = search 0 0
where
  patc x = pat `unsafeIndex` x
  strc x = str `unsafeIndex` x

  -- maybe we should make kmpNext a UArray before using it in search?
  kmpNext = listArray (0,m) (-1:kmpNextL pat (-1))
  kmpNextL p _ | null p = []
  kmpNextL p j = let j' = next (unsafeHead p) j + 1
                     ps = unsafeTail p
                     x = if not (null ps) && unsafeHead ps == patc j'
                            then kmpNext Array.! j' else j'
                    in x:kmpNextL ps j'
  search i j = match ++ rest -- i: position in string, j: position in pattern
    where match = if j == m then [(i - j)] else []
          rest = if i == n then [] else search (i+1) (next (strc i) j + 1)
  next c j | j >= 0 && (j == m || c /= patc j) = next c (kmpNext Array.! j)
           | otherwise = j
-}

And here's the new naive one:

findSubstrings :: ByteString -- ^ String to search for.
           -> ByteString -- ^ String to seach in.
           -> [Int]
findSubstrings pat str
    | null pat         = [0 .. length str]
    | otherwise        = search 0 str
where
    STRICT2(search)
    search n s
        | null s             = []
        | pat `isPrefixOf` s = n : search (n+1) (unsafeTail s)
        | otherwise          =     search (n+1) (unsafeTail s)

Don Stewart · Accepted Answer

The reason for the change was that the implementation of KMP was actually more inefficient than the naive version, which is implemented with an eye on performance.

findSubstrings and breakSubstring in Data.ByteString

Tags:

performance

string-matching

haskell

bytestring

sob7y

1 Answers

Don Stewart

Recent Activity

Donate For Us

findSubstrings and breakSubstring in Data.ByteString

Tags:

performance

string-matching

haskell

bytestring

sob7y

1 Answers

Don Stewart

Related questions

Recent Activity

Donate For Us