Assume I have a mathematical expression in a "tree" form in OCaml. It's represented as an algebraic type like this:
type expr =
Number of int
|Plus of expr*expr
Well, this is a very simplified definition, but it's enough to describe the problem.
I want to convert it to a string in a reverse polish notation, so that Plus (Number i, Number j)
becomes (+ i j)
. A straightforward implementation would be
let rec convert = function
Number i -> string_of_int i
|Plus (a,b) -> (let s = convert a in let p = convert b in "(+"^s^" "^p^")")
But the thing is that it's incredibly slow on some input (that have a big tree depth). For example, this input works 5 seconds on my machine:
let rec make_pain so_far = function
0 -> so_far |i -> make_pain (Plus (Number 1,so_far)) (i-1)
let pain = make_pain (Number 1) 20000
let converted = convert pain
It seems that string concatenation x^y
, where y
is a long string, is the performance problem. Indeed, if I replace the "(+"^s^" "^p^")"
expression with mere s^p
, it becomes much faster.
Using printf
instead of concatenation doesn't make it any faster. Converting to C might help, but isn't there an OCaml solution?
Use a string Buffer.
^
is defined as,
let (^) s1 s2 =
let l1 = string_length s1 and l2 = string_length s2 in
let s = string_create (l1 + l2) in
string_blit s1 0 s 0 l1;
string_blit s2 0 s l1 l2;
s
What you are doing is creating a new string each time and copying the old values in. Pretty much standard in any language where strings are represented as character arrays. The hangup happens because you are doing this FOUR times for each node (there isn't an optimization for multiple ^
calls)! As for a buffer, it will create a giant string and continually fill it in as managed by the data structure,
type t =
{mutable buffer : string;
mutable position : int;
mutable length : int;
initial_buffer : string}
Even if you decide to create the initial buffer size to 1
, the resize
function will adjust the size in a way that will limit the number of re-allocations. For example, the add_string
function will increase the size of the array by len*2^(n+p-len)
, where n
is the length of the new string, p
is the position and len
is the length of the original buffer --only if the buffer cannot support the string, of course. So, the size of the buffer grows exponentially and there will be few re-allocations as things progress. Of course, it's best to set the initial buffer to something reasonable, but it isn't necessary.
The new convert
function wouldn't look much more verbose:
let rec convert buf ex =
let addb = Buffer.add_string buf in
match ex with
Number i -> addb (string_of_int i)
|Plus (a,b) -> (addb "(+ "; convert buf a; addb " "; convert buf b; addb ")")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With