I have a lisp file which does lots of sampling, file I/O and arithmetic in a loop. (I do particle filtering in common lisp.) I am compiling my lisp file using the compile-file
command. I also use the (declaim (optimize (speed 3) (debug 0) (safety 0)))
at the beginning of my lisp file as I want to have my results as fast as possible.
I use (time (load "/....../myfile.lisp")
and (time (load "/......./myfile.dx64fsl")
to measure speed. The issue is that compiling does not bring any advantage to me. There is no improvement. Do I do something wrong? Is there a way to improve things? Speed is the most important criterion so I can sacrifice a lot in order to get a fast response. I have no idea about this kind of issues so any help would be appreciated.
Moreover, when I increase the number of particles (each particle is a vector of size ~40) to like 10000 the code gets really slow, so there may be some memory issues too.
Thank you very much in advance.
edit: This is the profiling results with 1000 particles and 50 iterations.
(LOAD "/.../myfile.dx64fsl") took 77,488,810 microseconds (77.488810 seconds) to run
with 8 available CPU cores.
During that period, 44,925,468 microseconds (44.925470 seconds) were spent in user mode
32,005,440 microseconds (32.005440 seconds) were spent in system mode
2,475,291 microseconds (2.475291 seconds) was spent in GC.
1,701,028,429 bytes of memory allocated.
1,974 minor page faults, 0 major page faults, 0 swaps.
; Warning: Function CREATE-MY-DBN has been redefined, so times may be inaccurate.
; MONITOR it again to record calls to the new definition.
; While executing: MONITOR::MONITOR-INFO-VALUES, in process repl-thread(10).
Cons
% % Per Total Total
Function Time Cons Calls Sec/Call Call Time Cons
------------------------------------------------------------------------------------------
SAMPLE: 25.61 26.14 2550000 0.000005 174 13.526 443040000
DISCRETE-PARENTS: 19.66 3.12 4896000 0.000002 11 10.384 52800000
LINEAR-GAUSSIAN-MEAN: 8.86 3.12 1632000 0.000003 32 4.679 52800000
DISCRETE-PARENT-VALUES: 7.47 12.33 3264000 0.000001 64 3.946 208896000
LIST-DIFFERENCE: 6.41 25.69 6528000 0.000001 67 3.384 435392000
CONTINUOUS-PARENTS: 6.33 0.00 1632000 0.000002 0 3.343 0
PF-STEP: 5.17 0.23 48 0.056851 80080 2.729 3843840
CONTINUOUS-PARENT-VALUES: 4.13 7.20 1632000 0.000001 75 2.184 122048000
TABLE-LOOKUP: 3.85 8.39 2197000 0.000001 65 2.035 142128000
PHI-INVERSE: 3.36 0.00 1479000 0.000001 0 1.777 0
PHI-INTEGRAL: 3.32 1.38 2958000 0.000001 8 1.755 23344000
PARENT-VALUES: 2.38 10.65 1122000 0.000001 161 1.259 180528016
CONDITIONAL-PROBABILITY: 1.41 0.00 255000 0.000003 0 0.746 0
------------------------------------------------------------------------------------------
TOTAL: 97.96 98.24 30145048 51.746 1664819856
Estimated monitoring overhead: 21.11 seconds
Estimated total monitoring overhead: 23.93 seconds
with 10000 particles and 50 iterations:
(LOAD "/.../myfile.dx64fsl") took 809,931,702 microseconds (809.931700 seconds) to run
with 8 available CPU cores.
During that period, 476,627,937 microseconds (476.627930 seconds) were spent in user mode
328,716,555 microseconds (328.716550 seconds) were spent in system mode
54,274,625 microseconds (54.274624 seconds) was spent in GC.
16,973,590,588 bytes of memory allocated.
10,447 minor page faults, 417 major page faults, 0 swaps.
; Warning: Funtion CREATE-MY-DBN has been redefined, so times may be inaccurate.
; MONITOR it again to record calls to the new definition.
; While executing: MONITOR::MONITOR-INFO-VALUES, in process repl-thread(10).
Cons
% % Per Total Total
Function Time Cons Calls Sec/Call Call Time Cons
-------------------------------------------------------------------------------------------
SAMPLE: 25.48 26.11 25500000 0.000006 174 144.211 4430400000
DISCRETE-PARENTS: 18.41 3.11 48960000 0.000002 11 104.179 528000000
LINEAR-GAUSSIAN-MEAN: 8.61 3.11 16320000 0.000003 32 48.751 528000000
LIST-DIFFERENCE: 7.57 25.66 65280000 0.000001 67 42.823 4353920000
DISCRETE-PARENT-VALUES: 7.50 12.31 32640000 0.000001 64 42.456 2088960000
CONTINUOUS-PARENTS: 5.83 0.00 16320000 0.000002 0 32.980 0
PF-STEP: 5.05 0.23 48 0.595564 800080 28.587 38403840
TABLE-LOOKUP: 4.52 8.38 21970000 0.000001 65 25.608 1421280000
CONTINUOUS-PARENT-VALUES: 4.25 7.19 16320000 0.000001 75 24.041 1220480000
PHI-INTEGRAL: 3.15 1.38 29580000 0.000001 8 17.849 233440000
PHI-INVERSE: 3.12 0.00 14790000 0.000001 0 17.641 0
PARENT-VALUES: 2.87 10.64 11220000 0.000001 161 16.246 1805280000
CONDITIONAL-PROBABILITY: 1.36 0.00 2550000 0.000003 0 7.682 0
-------------------------------------------------------------------------------------------
TOTAL: 97.71 98.12 301450048 553.053 16648163840
Estimated monitoring overhead: 211.08 seconds
Estimated total monitoring overhead: 239.13 seconds
Typical arithmetic stuff in Common Lisp can be slow. Improving it is possible, but needs a bit of knowledge.
Reasons:
One thing you can see from the profiling output is that you generate 1.7 GB garbage. This is a typical hint that your number operations cons. To get rid of that is often not that easy. It is just a guess on my side, that these are number operations.
Ken Anderson (unfortunately he died a few years ago) has some advice on his web site for improving numeric software: http://openmap.bbn.com/~kanderso/performance/
A usual solution is to give the code to some experienced Lisp developer which knows a bit about the compiler used and/or optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With