Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing strings with strings from a list

I am trying to write a function that takes a list of searchwords, a list of replacementwords and a string on which those will be used.

listReplace :: [String] -> [String] -> String -> String

The tricky part is that if the fitting searchword is the n'th, then the n'th replacementword should be used. Also, when a replacementword has been used it should not be replaced by a different replacementword if it is actually a searchword itself. I've already written these kind of functions for

replace :: String -> String -> String -> String:
replace x y [] = []
replace x y (z:zs) = if isPrefixOf x (z:zs) then y ++ replace x y (drop (length x) (z:zs)) else z: (replace x y zs)

and

replace' :: String -> [String] -> String -> String:
replace' x y [] = []
replace' x [] (z:zs) = []
replace' x y (z:zs) = if isPrefixOf x (z:zs) then concat (take 1 y) ++ replace' x  (drop 1 y) (drop (length x) (z:zs)) else z: (replace' x y zs)

I just dont know how to begin with this replaceList function tho, the only thing that might actually be useful that I've found so far is a function that replaces the n'th element in a list. But I cant seem to figure out how to put it to use in this case:

replace :: Int -> a -> [a] -> [a]
replace n a  []  = []  
replace 0 a (x:xs) = a : xs
replace n a (x:xs) = x : replace (n-1) a xs

well hopefully one of you can help me out! Thanks in advance :)

like image 639
user2299050 Avatar asked Dec 20 '22 06:12

user2299050


2 Answers

I would suggest a different type than

listReplace :: [String] -> [String] -> String -> String

What would happen if one called

listReplace ["one", "two"] ["een"] "I have two problems"

the substring "two" would be found, but there's no replacement for it provided.

Rather use

listReplace :: [(String, String)] -> String -> String

so that you are guaranteed that there are always exactly as many replacement strings as patterns you search for.

A simple implementation could then use

find :: (a -> Bool) -> [a] -> Maybe a

from Data.List to check if any of the provided patterns is a prefix of the remaining input,

listReplace _ "" = ""
listReplace replacements string@(c:cs)
    = case find ((`isPrefixOf` string) . fst) replacements of
        Just (pat,rep) -> rep ++ listReplace replacements (drop (length pat) string)
        Nothing -> c : listReplace replacements cs

This easy solution is not very efficient - that would require a more complicated algorithm - and it doesn't detect if one of the patterns to be replaced is a prefix of another, so that if the shorter pattern comes before the longer in the list, the longer pattern would never be used, even if it should be. That can be dealt with by sorting the list of replacements, for example descending by lexicographical order, before calling the replace function.

like image 65
Daniel Fischer Avatar answered Jan 17 '23 18:01

Daniel Fischer


My suggestion would be to use a somewhat different intermediate datastructure when processing the string that you want edited. Here is a solution that uses tries.

Preliminaries

import Data.Map (Map)
import qualified Data.Map as M

Tries

Here is a simple datatype of tries:

data Trie = Trie (Maybe String) (Map Char Trie)

Tries are constructed from the empty trie and a function for inserting key/value bindings into an existing trie:

empty :: Trie
empty =  Trie Nothing M.empty

insert :: String -> String -> Trie               -> Trie
insert    []        val       (Trie _ tries)     =  Trie (Just val) tries
insert    (c : cs)  val       (Trie mbVal tries) =  case M.lookup c tries of
  Nothing   -> Trie mbVal (M.insert c (insert cs val empty) tries)
  Just trie -> Trie mbVal (M.insert c (insert cs val trie)  tries)

Matching

With tries, matching reduces to recursing over the input string while traversing the trie. When a match is found, the corresponding replacement value is returned together with the remaining part of the input string (so that it can be subjected to further replacements):

match :: Trie ->                 String   -> Maybe (String, String)
match    (Trie (Just val) _    ) s        =  Just (val, s)
match    (Trie Nothing    _    ) []       =  Nothing
match    (Trie Nothing    tries) (c : cs) =  case M.lookup c tries of
  Nothing   -> Nothing
  Just trie -> match trie cs

Note that this function is greedy in the sense that it gives preference to the shortest match if multiple matches are possible. Adapting it so that it picks the longest match instead (and, hence, implements the "maximal-munch" principle) is not too hard.

Replacement

Replacing occurrences of search words by their corresponding replacements can be implemented by looking for a match in the input string: if a match is found, the replacement is put into the output string and we continue processing with the unmatched part of the string. If no match is found, we keep the head of the input string and proceed with the tail.

replace :: Trie -> String -> String
replace    trie =  go
  where
    go []         = []
    go s@(c : cs) = case match trie s of
      Nothing        -> c : go cs
      Just (s', s'') -> s' ++ go s''

Bringing it all together

Your required function listReplace is now almost trivial:

listReplace :: [String] -> [String] -> String -> String
listReplace    keys        vals     =  replace trie
  where
    trie = foldr ($) empty (zipWith insert keys vals)

As you see, the part that you refer to as "tricky" is easily realised by "zipping" the two list arguments.

Example

Here is a simple example (inspired by L. Peter Deutsch):

> let s = "to err is human; to forgive, divine"
> listReplace ["err", "forgive"] ["iterate", "recurse"] s

"to iterate is human; to recurse, divine"
like image 23
Stefan Holdermans Avatar answered Jan 17 '23 18:01

Stefan Holdermans