The following is what I've hacked together so far using "matrix memory" in implementing Левенште́йн. I'm trying to use Haskell for just about everything now so that I really learn it. Concepts I haven't really grasped yet include monad transformers, the state monad (working on it), and lenses.
import Data.Matrix
import Control.Monad.State
import Control.Applicative
type RecState = Int
-- Set up the first row
setLeftCol :: String -> Matrix Int -> Maybe (Matrix Int)
setLeftCol str mat = let strLength = length str + 1
in foldr helper (Just mat) [1..strLength]
where
helper :: Int -> Maybe (Matrix Int) -> Maybe (Matrix Int)
helper value matrixMon = (\m -> safeSet (value-1) (value,1) m) =<< matrixMon
-- Encapsulate a transposition in a Maybe context
transposeM :: Matrix a -> Maybe (Matrix a)
transposeM mat = Just (transpose mat)
-- Set up the first column
setTopRow :: String -> Matrix Int -> Maybe (Matrix Int)
setTopRow str mat = let mat' = return mat
in mat' >>= transposeM >>= (setLeftCol str) >>= transposeM
-- Generate coordinates
coords :: Int -> Int -> [(Int,Int)]
coords width height = [(x,y) | x <- [1..(width+1)], y <- [1..(height+1)]]
safeFst :: Maybe (Int,Int) -> Maybe Int
safeFst tuple = case tuple of
Just (x,y) -> Just x
Nothing -> Nothing
safeSnd :: Maybe (Int,Int) -> Maybe Int
safeSnd tuple = case tuple of
Just (x,y) -> Just y
Nothing -> Nothing
distance :: Matrix Int -> State RecState (Matrix Int)
distance matrix = do
index <- get
let coordinate = coordinates !! index
i = fst coordinate
j = snd coordinate
if index == size then
put matrix
return $ getElem i j matrix
else do
put (index + 1)
let ch1 = w1 !! (i - 1)
ch2 = w2 !! (j - 1)
cost = if ch1 /= ch2 then 1 else 0
entry1 = (getElem (i - 1) j matrix) + 1
entry2 = (getElem i (j - 1) matrix) + 1
entry3 = (getElem (i - 1) (j - 1) matrix) + cost
return $ distance $ setElem (minimum [entry1,entry2,entry3]) coordinate matrix
-- Compute the Levenshtein distance on two strings.
levenshtein :: String -> String -> Int
levenshtein "" "" = 0
levenshtein "" w2 = length w2
levenshtein w1 "" = length w1
levenshtein w1 w2 = let lenW1 = length w1
lenW2 = length w2
size = lenW1 * lenW2
matrix = Just $ zero (lenW1 + 1) (lenW2 + 1)
matrix' = matrix >>= setLeftCol w1 >>= setTopRow w2
coordinates = coords lenW1 lenW2
in execState (distance <$> matrix') (lenW1 + 2)
showResults :: Show r => r -> IO ()
showResults = putStrLn . show
showLevenshtein :: String -> String -> IO ()
showLevenshtein = showResults . levenshtein
My first question is how do I organize the distance
function with levenshtein
? I first put it in a where
clause following the line beginning with in execState...
. However, I found that neither size
nor coordinates
were accessible within this function, as they're defined in the original let
statement of levenshtein
.
Also feel free to comment on any other ideas I've tried here.
There's a formula for solving dynamic programming problems like this in Haskell.
a -> b
as (a -> b) -> (a -> b)
with no recursive calls.For levenshtien distance an array is appropriate.
Start by writing the levenshtien distance formula recursively in terms of itself
lev :: Eq a => [a] -> [a] -> (Int, Int) -> Int
lev a b (0, 0) = 0
lev a b (0, j) = j
lev a b (i, 0) = i
lev a b (i, j) = (lev a b (i-1, j) + 1) `min` (lev a b (i, j-1) + 1) `min` (lev a b (i-1, j-1) + if a !! (i - 1) == b !! (j - 1) then 0 else 1)
The levenshtien distance for two strings is the distance calculated all the way to the last characters
levenshtien :: Eq a => [a] -> [a] -> Int
levenshtien a b = lev a b upperBound
where
upperBound = (length a, length b)
Then replace the recursive calls to calls to some other function f
that somehow implements the rest of levenshtien distance.
lev' :: Eq a => [a] -> [a] -> ((Int, Int) -> Int) -> (Int, Int) -> Int
lev' a b f (0, 0) = 0
lev' a b f (0, j) = j
lev' a b f (i, 0) = i
lev' a b f (i, j) = (f (i-1, j) + 1) `min` (f (i, j-1) + 1) `min` (f (i-1, j-1) + if a !! (i - 1) == b !! (j - 1) then 0 else 1)
You can recover lev
from lev'
by using fix
, which is defined as fix f = let x = f x in x
import Data.Function
lev :: Eq a => [a] -> [a] -> (Int, Int) -> Int
lev a b = fix (lev' a b)
Finally you'll need a way to memoize the intermediate results in an array. I find the following to be an easier way to build an array than the functions in Data.Array.
import Data.Array
buildArray :: Ix i => (i, i) -> (i -> e) -> Array i e
buildArray bounds f = listArray bounds (f <$> range bounds)
We can memoize a function in an array by building an array holding some of the results and use the stored value from the array if the argument is in the array, and using the original function if it's not.
memoArray :: Ix i => (i, i) -> (i -> e) -> (i -> e)
memoArray bounds f = \i -> if inRange bounds i then arr ! i else f i
where
arr = buildArray bounds f
We can fix a function with some of its values memoized in an array by fixing the function composed with memoing some of its values.
fixArray :: Ix i => (i, i) -> ((i -> e) -> i -> e) -> (i -> e)
fixArray bounds f = fix (memoArray bounds . f)
Finally we can rewrite levenshtien in terms of lev'
and fixArray
, memorizing all the important bits that would be re-used along the way.
levenshtien :: Eq a => [a] -> [a] -> Int
levenshtien a b = fixArray ((1, 1), upperBound) (lev' a b) upperBound
where
upperBound = (length a, length b)
Further improvements
!!
by replacing the lists with arraysIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With