This is a follow up to my previous question.
I noticed that if I manipulate two strings (e.g. to get the longest common prefix, calculate the difference between two strings, etc.) I tend to use zip
(like in (s1 zip s2)...
).
It looks nice but probably inefficient (in comparison to imperative code). Is it correct ? Perhaps, if performance does matter I should use an imperative algorithm.
Now I am trying to find if two strings differ. Would you recommend using zip
to do it?
It is slightly more efficient to use (s1,s2).zipped
than (s1 zip s2)
since you don't need to create the tuples; instead you create functions that take two arguments.
Better yet for efficiency but not ease of use is to define your own custom specialized string folder:
abstract class StrFold[@specialized T](var result: T) {
def apply(c1: Char, c2: Char): Unit
def done: Boolean
}
def strfold[@specialized T](s1: String, s2: String)(folder: StrFold[T]): T = {
var i = 0
val L = math.min(s1.length, s2.length)
while (i < L) {
folder(s1.charAt(i), s2.charAt(i))
if (folder.done) return folder.result
i += 1
}
folder.result
}
Now if you want to find the length of the longest common prefix, you
class LcpFind extends StrFold(0) {
var done = false
def apply(c1: Char, c2: Char) { if (c1 == c2) result += 1 else done = true }
}
def lcp(s1: String, s2: String) = strfold(s1,s2)(new LcpFind)
It's a fair bit of work--arguably almost as much as just writing out the imperative or recursive functional code to calculate LCP without a fold-with-escape-clause.
But it should be almost as fast as writing the low-level string stuff by hand each time. (The only penalty should be creating the (tiny) LcpFind
object each time, and if you really wanted to you could reuse it by resetting result
to zero between calls, or you could modify StrFold
and strfold
to implement and call a reset
method, changing the input parameter to be named zero
and having a separate result
method.)
Yes, it's going to be less efficient for two reasons.
You are creating a list of pairs of characters the same length as the shorter string. This will take substantially more place than the original two strings. Usual ways to find longest common prefix or check if strings differ in one character don't require any additional memory.
When finding LCP, you don't necessarily need to iterate over entire strings. Since zip
does need to do it, it will obviously be less efficient to zip them first.
But this doesn't mean you have to use imperative algorithms: a normal tail-recursive functional algorithm will perform just as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With