Here is an excerpt from the Mastering Dyalog APL book, from the chapter on Inner Products:
HMS is a variable which contains duration in Hours, Minutes, and Seconds: HMS ← 3 44 29 Chapter J – Operators 397
We would like to convert it into seconds. We shall see 3 methods just now, and a 4th
method
will be given in another chapter.
A horrible solution (3600×HMS[1]) + (60×HMS[2]) + HMS[3]
A good APL solution +/ 3600 60 1 × HMS
An excellent solution with Inner Product 3600 60 1 +.× HMS
It then says that The second and third solutions are equivalent in terms of number of characters typed and performance.
As I understand it, APL programmers should generally use Inner Product, as well as Outer Product, as much as possible. Is that correct?
Can you give an example when using Inner Product would lead to performance gains? What exactly happens when I use Inner Product (on a lower level)? Is the first solution presented below horrible just because it doesn't use APL syntax in a proper way or does it actually have worse performance?
I know there are few questions but want I am asking about in general is how the Inner/Outer Products work and when exactly should an APL programmer use them.
We’ve done work to optimize both the +/ and the +.×.
MBaas is right in that it happens that the +/ in this instance is slightly better than the +.×
Our general advice is: use the constructs in the language best suited for the job, and eventually the implementation will catch up.
The "horrible" solution is considered bad as it does not use array thinking.
Regards,
Vince, Dyalog Support
APL programmers should generally use Inner Product, as well as Outer Product, as much as possible. Is that correct?
It is really up to the APL programmer and the task at hand, but if something makes APL code more concise and efficient, I don't see why a programmer wouldn't opt for it.
In this particular case 60⊥HMS
is even more concise and efficient than the inner product.
Can you give an example when using Inner Product would lead to performance gains?
As typical in array-oriented programming, performance gains are achieved by doing things in one go. Most APL functions are implicit loops---their implementation uses a counter, a limit for it, and an increment step. The shorter your code is, the better, because not only it's easier to hold in one's head, it's also more efficient as the interpreter has to do fewer passes over the data. Some implementations do loop fusion in an attempt to reduce this overhead. Some have idiom recognition---certain combinations of squiggles are special-cased in the interpreter. Doing things in one go also allows the interpreter to do clever optimisations like using the SSE instruction set or GPUs.
Coming back to inner product, let's take the example of A f.g B
where A
and B
are vectors and see how f
and g
are applied (in Dyalog):
f←{⎕←(⍕⍺),' f ',⍕⍵ ⋄ ⍺+⍵}
g←{⎕←(⍕⍺),' g ',⍕⍵ ⋄ ⍺×⍵}
0 1 2 3 4 f.g 5 6 7 8 9
4 g 9
3 g 8
24 f 36
2 g 7
14 f 60
1 g 6
6 f 74
0 g 5
0 f 80
80
You can see from the above that calls to f
and g
are interleaved. The interpreter apples f
and reduces on g
simultaneously, in one pass, avoiding the creation of a temporary array, like f/ A g B
would do.
Another example: http://archive.vector.org.uk/art10500200
You can test the performance of different solutions for yourself and see which one works best:
)copy dfns.dws cmpx
⍝ or: ")copy dfns cmpx" if you are using Windows
HMS ← 3 44 29
cmpx '(3600×HMS[1]) + (60×HMS[2]) + HMS[3]' '+/ 3600 60 1 × HMS' '3600 60 1 +.× HMS' '60⊥HMS'
(3600×HMS[1]) + (60×HMS[2]) + HMS[3] → 2.7E¯6 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
+/ 3600 60 1 × HMS → 9.3E¯7 | -66% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
3600 60 1 +.× HMS → 8.9E¯7 | -68% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
60⊥HMS → 4.8E¯7 | -83% ⎕⎕⎕⎕⎕⎕⎕
The problem with generalization is that they might be incorrect, but as rule of thumb I'd say using the inner & outer products will benefit readability as well as performance ;-)
Now, looking at the thing in practice:
` ]performance.RunTime (3600×HMS[1])+(60×HMS[2])+HMS[3] -repeat=100000
Benchmarking "(3600×HMS[1])+(60×HMS[2])+HMS[3]", repeat=100000 Exp CPU (avg): 0.001558503836 Elapsed: 0.001618446292
]performance.RunTime '+/ 3600 60 1 × HMS' -repeat=100000
Benchmarking "+/ 3600 60 1 × HMS", repeat=100000 Exp CPU (avg): 0.0004698496481 Elapsed: 0.0004698496481 `
That is quite a difference - if you repeat it enough times to be measureable ;-) But of course with larger dataset the advantage gets more visible! Let's also look at the 3variant:
` ]performance.RunTime '3600 60 1 +.× HMS' -repeat=100000
No difference here, but again - with "real data" (larger array) you should see a much clearer difference. I think a simple explanation is that inner product is like one 'statement' for the interpreter, whereas the first variant has 3 single multiplications, indexing and needs to consider priorities (brackets) and then sum up that vector, which sounds like a lot of sweat ;-) The 2nd statement has one multiplication only (for a vector), so it elimates several steps already, and the inner product enables the interpreter to possibly combine some of its internal working to do his job even faster.
BUT now here's a surprise: v1←(10000/3600 60 1) ⋄v2← 10000/HMS ]performance.RunTime '+/v1 × v2' 'v1 +.× v2' -repeat=100000 -compare
+/v1 × v2 → 6.0E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
v1 +.× v2 → 6.3E¯5 | +5% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
I expected that the bigger arguments would help to make the last expression's performance-advantage more visible - but actually #2 won. Maybe Dyalog optimized case #2 more than #3... ;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With