I want to write a function which takes a matrix as an input. This is a frequent low-level call in a complicated project, so making this function as fast as possible has potentially serious performance implications. Because speed is so important to me, I'm using the types in <code>FixedSizeArrays</code> as I know that this will save on memory usage. But I often know certain properties of the input matrix, and I'm not certain that I'm making optimal use of that. Here's a simple example. Imagine that the function I want to make the following as fast as possible: <pre class="prettyprint"><code>using FixedSizeArrays function foo( input::Mat ) # NB: Mat is the FixedSizeArrays matrix type return 2 * input end </code></pre> Obviously this is a trivial example, but that's not the point. The point is that I know something about the dimensions of the matrix <code>input</code>: it always has only two columns and I can always specify the number of rows at run-time. This seems like information that could be passed to the compiler to make my code faster. Could I pass it as an argument that defines the size of <code>input</code> somehow? Here's an example that doesn't work, but should give you some idea of what I'm trying to do. <pre class="prettyprint"><code>function bar( int::N, thismat::Mat{N,2,Float64} ) return 2 * thismat end </code></pre> Is there something like this that I can do? Would this even work if I could? Maybe FixedSizeArrays already does everything that can be done. Thanks for you thoughts!

Fixed-size arrays are already specialized on size. These arrays are not appropriate for when the number of rows, <code>N</code> in your case, can vary. Any performance problems you notice are probably because of overspecialization. Let me be a little more specific. The Julia compiler is able to achieve zero-cost abstraction through aggressive specialization on the types of arguments. So in general (that is, in all cases except for a few where specialization would be too expensive, or is explicitly disabled), if a function is called with two different type signatures, two versions of this function will be compiled. Since the size of a <code>Mat</code> is part of its type, that means a version will be compiled for each possible size of the <code>Mat</code>. So the specialization you seek is done already. Specialization, however, is not free. There are two costs associated with it: <ul> <li>The first time a function is called on a particular signature, memory will be allocated and the compiler will have to run.</li> <li>When an parameter whose type cannot be inferred is passed to a function, there is a "type instability", and dynamic dispatch is required. Dynamic dispatch involves runtime lookups.</li> </ul> Thus if your matrices are of the size <code>(2, N)</code>, where <code>N</code> varies and is not known at compile time, the performance cost of dynamic dispatch will be incurred. This performance cost can be restricted by using the function barrier technique: we only incur that cost once for each type-unstable call, so limiting the number of such calls improves performance. But what would increase performance even more would be to avoid this dynamic dispatch entirely. It is possible to construct an array type that only encodes the number of columns in the type, and has the number of rows as a field at runtime. That is, it's possible your performance problem is due to overspecialization, and you need to create your types to reduce the amount of specialization. Finding the right balance is central to squeezing as much performance as possible out of an application. Specializing on the size of an array is in fact useful quite rarely—even C and C++ code, for instance, tends to pass array sizes as runtime parameters, instead of specializing on a particular array size. This is not that expensive. In more cases that not, <code>FixedSizeArrays.jl</code> will not improve performance, but rather hurt it. There are certainly situations where it will help—but yours may not be one of them. <hr> In your case, for maximal performance, I suspect that a type like this would be fastest: <pre class="prettyprint"><code>immutable TwoColumnMatrix{T, BaseType} <: AbstractArray{T, 2} height::Int base::BaseType end function TwoColumnMatrix(A::Matrix) size(A, 2) == 2 || throw(ArgumentError("must be two columns")) TwoColumnMatrix{eltype(A), typeof(A)}(size(A, 1), A) end Base.@propagate_inbounds function getindex(M::TwoColumnMatrix, n::Int) M.base[n] end size(M::TwoColumnMatrix) = (M.height, 2) </code></pre> You might need to define additional methods for maximum performance, and as always, benchmark. It's possible the overhead of the wrapper is not worth the compiler knowing about the dimensions.

Optimally passing dimensions of fixed size array in julia

Tags:

performance

julia

fixed-size-types

I want to write a function which takes a matrix as an input. This is a frequent low-level call in a complicated project, so making this function as fast as possible has potentially serious performance implications. Because speed is so important to me, I'm using the types in FixedSizeArrays as I know that this will save on memory usage. But I often know certain properties of the input matrix, and I'm not certain that I'm making optimal use of that.

Here's a simple example. Imagine that the function I want to make the following as fast as possible:

using FixedSizeArrays

function foo( input::Mat )
# NB: Mat is the FixedSizeArrays matrix type
  return 2 * input
end

Obviously this is a trivial example, but that's not the point. The point is that I know something about the dimensions of the matrix input: it always has only two columns and I can always specify the number of rows at run-time. This seems like information that could be passed to the compiler to make my code faster. Could I pass it as an argument that defines the size of input somehow? Here's an example that doesn't work, but should give you some idea of what I'm trying to do.

function bar( int::N, thismat::Mat{N,2,Float64} )
  return 2 * thismat
end

Is there something like this that I can do? Would this even work if I could? Maybe FixedSizeArrays already does everything that can be done. Thanks for you thoughts!

394

asked Jul 18 '16 19:07

squipbar

1 Answers

Fixed-size arrays are already specialized on size. These arrays are not appropriate for when the number of rows, N in your case, can vary. Any performance problems you notice are probably because of overspecialization.

Let me be a little more specific.

The Julia compiler is able to achieve zero-cost abstraction through aggressive specialization on the types of arguments. So in general (that is, in all cases except for a few where specialization would be too expensive, or is explicitly disabled), if a function is called with two different type signatures, two versions of this function will be compiled.

Since the size of a Mat is part of its type, that means a version will be compiled for each possible size of the Mat. So the specialization you seek is done already.

Specialization, however, is not free. There are two costs associated with it:

The first time a function is called on a particular signature, memory will be allocated and the compiler will have to run.
When an parameter whose type cannot be inferred is passed to a function, there is a "type instability", and dynamic dispatch is required. Dynamic dispatch involves runtime lookups.

Thus if your matrices are of the size (2, N), where N varies and is not known at compile time, the performance cost of dynamic dispatch will be incurred. This performance cost can be restricted by using the function barrier technique: we only incur that cost once for each type-unstable call, so limiting the number of such calls improves performance.

But what would increase performance even more would be to avoid this dynamic dispatch entirely. It is possible to construct an array type that only encodes the number of columns in the type, and has the number of rows as a field at runtime. That is, it's possible your performance problem is due to overspecialization, and you need to create your types to reduce the amount of specialization.

Finding the right balance is central to squeezing as much performance as possible out of an application. Specializing on the size of an array is in fact useful quite rarely—even C and C++ code, for instance, tends to pass array sizes as runtime parameters, instead of specializing on a particular array size. This is not that expensive. In more cases that not, FixedSizeArrays.jl will not improve performance, but rather hurt it. There are certainly situations where it will help—but yours may not be one of them.

In your case, for maximal performance, I suspect that a type like this would be fastest:

immutable TwoColumnMatrix{T, BaseType} <: AbstractArray{T, 2}
    height::Int
    base::BaseType
end

function TwoColumnMatrix(A::Matrix)
    size(A, 2) == 2 || throw(ArgumentError("must be two columns"))
    TwoColumnMatrix{eltype(A), typeof(A)}(size(A, 1), A)
end

Base.@propagate_inbounds function getindex(M::TwoColumnMatrix, n::Int)
    M.base[n]
end

size(M::TwoColumnMatrix) = (M.height, 2)

You might need to define additional methods for maximum performance, and as always, benchmark. It's possible the overhead of the wrapper is not worth the compiler knowing about the dimensions.

106

answered Sep 30 '22 18:09

Fengyang Wang

Related questions
                            
                                Browser gzip decompression overhead / speed
                            
                                numpy np.array versus np.matrix (performance)
                            
                                Exceptionally slow Javascript loop
                            
                                What is the fastest way to check a type?
                            
                                Is there a performance advantage in using an object literal over a self instantiated constructor?
                            
                                Pattern Matcher Vs String Split, which should I use?
                            
                                Javascript foreach with condition
                            
                                Hibernate out of memory exception while processing large collection of elements
                            
                                Improving OpenCV performance Android - fast object tracking
                            
                                A big loop within a small loop always faster than a small loop within a big one?
                            
                                Why is the short primitive type significantly slower than long or int?
                            
                                Swift: how add offset to memcpy(...)
                            
                                Plotting large number of time series using ggplot. Is it possible to speed up?
                            
                                CLOS make-instance is really slow and causes heap exhaustion in SBCL
                            
                                How to prevent Elasticsearch from index throttling?
                            
                                How to use Python 3.4's enums without significant slowdown?
                            
                                Appending to slice bad performance.. why?
                            
                                Explain COUNT query with ActiveRecord
                            
                                How to narrow down perf.data to a time sub interval
                            
                                keras predict is very slow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With