I am writing the program in CL (with SBCL 1.2.15) that uses linear algebra. During the course of execution, it often multiplies a matrix by a vector.
Profiler showed that most of the time (80%) the program spends doing exactly that, multiplying matrix by vector. It also shows that this function does lots of consing (80,000,000 for ~100 calls for 100x100 matrices), which triggers GC as well. Having done something similar in F# (that has a benefit of static type-checking, but otherwise does not usually outperform SBCL), F# program runs 10 times faster.
Am I doing it wrong?
(defun matrix-mul (matrix vector dest)
"Multiply MATRIX by VECTOR putting the result into DEST.
Optimized for DOUBLE-FLOAT vectors and matrices"
(declare (type (array double-float (* *)) matrix)
(type (vector double-float *) vector dest)
(optimize (speed 3) (debug 0) (safety 0)))
(destructuring-bind (rows cols) (array-dimensions matrix)
(dotimes (i rows)
(setf (aref dest i)
(loop for j below cols sum (the double-float
(* (aref matrix i j) (aref vector j))))))))
PS. I have also tried using DOTIMES instead of inner LOOP - no difference
PPS. Next option can be using BLAS from CL, but (i) I am not looking forward to making it work on Windows, (ii) will need marshaling matrices/vectors back and forth.
The usual problem is that float arithmetic, here with double-floats, (independent of the surrounding code like matrix multiplication) conses.
Generally the first thing to do with SBCL in such cases:
Put the code into a file and compile it
The compiler will then output a lot of optimization problems, given that one compiles for speed. You would then need to examine the notes and see what you can do.
Here for example the LOOP
sum lacks type information.
There is actually a LOOP
syntax to declare the type of the sum variable. I don't know if SBCL takes advantage of that:
(loop repeat 10
sum 1.0d0 of-type double-float)
SBCL 1.3.0 on 32bit ARM for your code:
* (compile-file "/tmp/test.lisp")
; compiling file "/tmp/test.lisp" (written 13 DEC 2015 11:34:26 AM):
; compiling (DEFUN MATRIX-MUL ...)
; file: /tmp/test.lisp
1)
; in: DEFUN MATRIX-MUL
; (SETF (AREF DEST I)
; (LOOP FOR J BELOW COLS
; SUM (THE DOUBLE-FLOAT (* # #))))
; --> LET* FUNCALL SB-C::%FUNCALL (SETF AREF)
; ==>
; (SB-KERNEL:HAIRY-DATA-VECTOR-SET ARRAY SB-INT:INDEX SB-C::NEW-VALUE)
;
; note: unable to
; avoid runtime dispatch on array element type
; due to type uncertainty:
; The first argument is a (VECTOR DOUBLE-FLOAT), not a SIMPLE-ARRAY.
2)
; (AREF MATRIX I J)
; --> LET*
; ==>
; (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
; avoid runtime dispatch on array element type
; due to type uncertainty:
; The first argument is a (ARRAY DOUBLE-FLOAT (* *)), not a SIMPLE-ARRAY.
3)
; (AREF VECTOR J)
; ==>
; (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
; avoid runtime dispatch on array element type
; due to type uncertainty:
; The first argument is a (VECTOR DOUBLE-FLOAT), not a SIMPLE-ARRAY.
4)
; (LOOP FOR J BELOW COLS
; SUM (THE DOUBLE-FLOAT (* (AREF MATRIX I J) (AREF VECTOR J))))
; --> BLOCK LET SB-LOOP::WITH-SUM-COUNT LET SB-LOOP::LOOP-BODY TAGBODY SETQ THE
; ==>
; (+ #:LOOP-SUM-8 (THE DOUBLE-FLOAT (* (AREF MATRIX I J) (AREF VECTOR J))))
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
; optimize
; due to type uncertainty:
; The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
5)
; --> BLOCK LET SB-LOOP::WITH-SUM-COUNT LET SB-LOOP::LOOP-BODY TAGBODY WHEN IF
; --> >= OR LET IF OR THE = IF
; ==>
; (= SB-C::X SB-C::Y)
;
; note: unable to open code because: The operands might not be the same type.
6)
; (DOTIMES (I ROWS)
; (SETF (AREF DEST I)
; (LOOP FOR J BELOW COLS
; SUM (THE DOUBLE-FLOAT #))))
; --> DO BLOCK LET TAGBODY UNLESS IF >= IF
; ==>
; (< SB-C::X SB-C::Y)
;
; note: forced to do static-fun Two-arg-< (cost 53)
; unable to do inline fixnum comparison (cost 4) because:
; The second argument is a INTEGER, not a FIXNUM.
; unable to do inline (signed-byte 32) comparison (cost 6) because:
; The second argument is a INTEGER, not a (SIGNED-BYTE 32).
; etc.
7)
; (LOOP FOR J BELOW COLS
; SUM (THE DOUBLE-FLOAT (* (AREF MATRIX I J) (AREF VECTOR J))))
; --> BLOCK LET SB-LOOP::WITH-SUM-COUNT LET SB-LOOP::LOOP-BODY TAGBODY WHEN IF
; --> >= OR LET > IF
; ==>
; (> SB-C::X SB-C::Y)
;
; note: forced to do static-fun Two-arg-> (cost 53)
; unable to do inline fixnum comparison (cost 4) because:
; The second argument is a REAL, not a FIXNUM.
; unable to do inline (signed-byte 32) comparison (cost 6) because:
; The second argument is a REAL, not a (SIGNED-BYTE 32).
; etc.
8)
; --> BLOCK LET SB-LOOP::WITH-SUM-COUNT LET SB-LOOP::LOOP-BODY TAGBODY SETQ THE
; ==>
; (+ #:LOOP-SUM-8 (THE DOUBLE-FLOAT (* (AREF MATRIX I J) (AREF VECTOR J))))
;
; note: forced to do static-fun Two-arg-+ (cost 53)
; unable to do inline float arithmetic (cost 2) because:
; The first argument is a NUMBER, not a DOUBLE-FLOAT.
; The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES DOUBLE-FLOAT
; &REST T).
;
; note: doing float to pointer coercion (cost 13), for:
; the second argument of static-fun Two-arg-+
;
; compilation unit finished
; printed 10 notes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With