I recently enquired about the fastest/most efficient way to count newlines in Rebol—I now need to ascertain which approach is best in a given situation.
Some example scenarios: Short Text, Fewer Newlines; Short Text, Many Newlines; Medium/Long Text, Many Newlines (Code); Medium/Long Text, Fewer Newlines (Article).
Some uncertainties I have are: if I run one after the other, will the second test be tainted by the first? How much of a difference will there have to be to require different functions for different scenarios (optimization) vs. one function fits all (convenience)? I need benchmarks for BOTH Rebol 2.7.8 and Rebol 3.
Here is the particular set of functions I'd like to test, although more generic answers are also appreciated:
reduce [
"@rebolek"
func [text [string!] /local i][
parse text [(i: 1) any [thru newline (++ i)]] i
]
"@darius"
func [text [string!] /local tmp][
tmp: sort join text "^/"
1 + subtract index? find/last tmp "^/" index? find tmp "^/"
]
"@endo64"
func [text [string!]][
(length? text) - (length? remove-each v text [v = #"^/"])
]
"@BrianH"
function [text [string!]] [
i: 1
find-all text newline [++ i]
i
]
"@MaxV"
func [text [string!]][
write %.mytext text
text: read/lines %.mytext
delete %.mytext
length? text
]
]
The challenge with profiling in REBOL (R2 & R3) is three-fold. timing, looping & Memory use.
on some OSes the default timing is not precise (like windows). this can be largely alleviated by creating bigger loops which basically scale the test to acceptable timing margins. You can also create better timers, like the chrono lib I built for windows (chrono-lib.r).
it also includes a time lapse function to make testing blocks easy (basically a more precise 'DT for windows).
This is an obvious thing, when you time stuff, to remove some of the OS/multi-tasking overhead, you run code several times and divide the repetitions up to get an average. This will often be enough to get a good idea.
BUT, in rebol, looping is rather costly in and of itself. The Iteration functions have their own, non-negligible overhead. So before looping some code, make sure the loop you are using is optimal, make especially sure that you use a native iterator function, because the loop may end-up being slower than the code you are trying to profile. In R2, FOREACH and LOOP are the faster loop in most situations. FOREACH has a little binding overhead before it starts looping, but in my extensive tests this had immeasurable impact, as it only occurs once and is irrelevant on a run of a few seconds.
In R3, iterator funcs where given a speed boost since some of the mezz loops in R2 have become native, so you'd have to check them to see.
This is where things get less predictable. REBOLs GC is quite intrusive when profiling. Not only is it slow, it's also unpredictable, can't be practically optimised, and it slows down with any increase in memory footprint of REBOL (in fact, a lot more than people realize).
by how much? lets see:
this is a script I have to benchmark memory use and speed of execution. it generates a graphic which can show me a relation of one with the other. It has one option to turn off the Garbage collector... and you will see how it can affect profiling... the graphic with and without is quite telling:
rebol [
title: "profiling tests with graphics"
author: "Maxim Olivier-Adlhoch"
purpose: "given a few metrics and code blocks, will plot out the time and memory use for each repetition."
]
;----
; test metrics
loops: 2000
repetitions: 500
;----
; test options
turn-off-garbage-collector?: false
;----
; output gfx prefs
mem-color: red
time-color: sky
bg-color: black
gfx-size: 600x400
margins: 100x100
lw: 2 ; line-width
label-color: white * .85
border-color: gray * 0.5
slices: 10 ; how many labels on the edges of the graphics.
;----
; globals
plot-data: []
;----
;- functions
platform: does [
select [
1 AMIGA
2 OSX
3 WIN32
4 LINUX
] system/version/4
]
;----
; make sure timer resolution is decent on all platforms
;----
either ( platform = 'WIN32 ) [
either exists? %libs/windows-chrono.r [
do %libs/windows-chrono.r
][
ask "download and save windows-chrono from rebol.org ? ^/^/ * press ENTER to confirm^/ * press ESCAPE to halt."
make-dir %libs/
write %libs/windows-chrono.r read http://www.rebol.org/download-a-script.r?script-name=windows-chrono.r
print "windows chrono downloaded"
]
][
; on other platforms, the OS timer resolution tends to be better, we can just use delta-time
time-lapse: :delta-time
]
; time-lapse: :delta-time
;----
; TEST CODE INITIALISATIONS
blk: make block! repetitions
print "===================="
print "running tests"
;----
; SETUP TEST OPTION(S)
if turn-off-garbage-collector? [
recycle/off
]
;--------------------------------------------
; PERFORM AND PLOT TESTS
;--------------------------------------------
repeat i repetitions [
;--------------------------------------------
; PUT YOUR LOOP INIT CODE HERE:
;--------------------------------------------
tmp: last append/only blk copy []
time: time-lapse [
loop loops [
;--------------------------------------------
; PUT YOUR TEST CODE HERE:
;--------------------------------------------
; here we just accumulate RAM...
append tmp make string! 1000
]
]
memory: stats
append plot-data reduce [ time memory]
prin "."
]
;-------------------------
; extract plot data scale
;-------------------------
time-x: 0:00
stat-y: 0
foreach [time stat] plot-data [
time-x: max time-x time
stat-y: max stat-y stat
]
time-scale: (gfx-size/y / to-decimal time-x )
mem-scale: gfx-size/y / stat-y
print ""
?? time-scale
?? mem-scale
;-------------------------
; build gfx
;-------------------------
;-------
; lines
i: 0
mem-line: compose [line-width lw pen (mem-color ) line () ]
time-line: compose [line-width lw pen (time-color ) line () ]
foreach [time ram] plot-data [
time: to-decimal time
;?? time
;print time * time-scale
append mem-line margins + to-pair reduce [ x: to-integer (i / (repetitions - 1) * gfx-size/x) to-integer ( ram * mem-scale )]
append time-line margins + to-pair reduce [ x to-integer ( time * time-scale )]
i: i + 1
]
;------
;scales
scale-drw: compose [
line-width 1
pen (border-color) box (margins) (margins + gfx-size)
]
repeat i (slices + 1) [
ii: i - 1
append scale-drw reduce [
'pen mem-color
'text margins + to-pair reduce [ -50 (gfx-size/y - (ii / slices * gfx-size/y ) ) - 5 ]
rejoin [ to-string round/to (ii / slices * stat-y / 1'000'000) 0.01 " MB" ]
'pen time-color
'text margins + to-pair reduce [ gfx-size/x (gfx-size/y - (ii / slices * gfx-size/y ) ) - 5 ]
rejoin [ to-string round/to (1000 * ii / slices * to-decimal time-x) 0.1 "ms" ]
'pen border-color
'text margins + to-pair reduce [ ((ii / slices * gfx-size/x ) ) gfx-size/y + 10 ]
rejoin [ to-string to-integer( ii / slices * repetitions) ]
]
]
view layout compose/deep [
box (margins * 2 + gfx-size) bg-color effect [draw [
translate (0x1 * (margins * 2 + gfx-size)) scale 1 -1.0 (mem-line) (time-line)
reset-matrix
(scale-drw)
]]
]
Feel free to copy and edit this script as much as you want, its a really simple plot mechanism... I'm sure you can add additional curves (like one for the output value of a function you're testing).
at the top of this script you can see there is an option to disable the GC... in this test code which is in the loops, we just allocate RAM, to see how the GC impacts overall performance.
here is a run with GC enabled:
you can see that execution is hampered since the GC keeps monitoring and interrupting the process.
as the RAM increases (the red line) the execution constantly slows down. note that all blocks are pre-allocated, so this is not due to internal memory copying.
here is a run with GC disabled:
you can clearly see how linear the operation is.
the irregular jumps you have in both runs is due to normal OS multi-tasking.
notice that the best speed in both tests is the same. around 5milliseconds... but the average, just keeps getting worse and worse in the GC enabled run.
For this reason, when I test any code, I always turn the GC OFF, unless the immediate memory generated by the code quickly runs the task out of memory. alternatively, you can recycle/off recycle/on at key points in your code to alleviate this part of the problem out of your equation. when recycle/on is called, it will immediately do a cleanup, if its beyond its threshold.
Note that the memory allocation complexity has an impact on the GC. Its not only the amount of RAM, but the number of values being built which will have large consequences. using the above profiler, you can build exponentially slow algorithms which become almost unusable at just 300MB when they where real-time to begin with.
binding and creating many, complex objects, is one thing which will completely destroy any pretense of speed in rebol. if you are going to create large datasets, you really are better off using block hierarchies rather than objects hierarchies.
I hope the above script will help you profile some of the stuff you want to test.
In Changes to high resolution time in R3, Carl Sassenrath described the improved (over Rebol 2) timer in Rebol 3. You can obtain a timestamp through stats/timer
that has a resolution of a few microseconds. To measure a time delta, use delta-time
as shown below, but see the blog post about accuracy in taking deltas cross CPU frequency changes.
A couple of the important points found therein:
and some examples:
Script that accurately shows total boot time:
Rebol [file: %boot-time.r]
print stats/timer
Simple example timing the SINE function:
delta-time [loop 100 [sine 123]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With