Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I use a linear algebra library like Math.NET

I am not certain there is one correct answer to the question, but here we go. While numerous numerical problems can be stated in a linear algebra form, it seems from my limited experience that there is a performance overhead for simple operations in using Math.NET over writing equivalent operations on raw arrays.

As a test case, I wrote code to compute the distance between a vector and the closest vector in a list, with 3 versions: operating on arrays, operating on dense vectors, and operating on dense vectors with the MKL provider. Working on arrays ran about 4x faster than on vectors, and 3x faster than using the MKL provider.

The downside is that I had to write by hand a distance computation, instead of leveraging the built-in Norm function. The upside is that it's much faster. Note: I didn't post the code, will be happy to do so if needed, I might also be using Math.NET improperly.

So my question is as follows: it seems to me that using higher-level abstractions comes at a performance cost. Is that in general the case, or are there situations (like sparse matrices for instances) where using Math.NET would be expected to outperform manually written operations on arrays?

If that is the case, I would tend to think that using the linear algebra part of Math.NET would be mostly useful for "real" algebra that involves matrices, to avoid re-implementing more complex calculations/algorithms, and potentially for code readability, but that for operations which are more simple vector by vector operations, it might be a better idea to work on raw arrays.

Any light on when it's a good idea to use the library vs. when you should roll your own would be appreciated!

like image 604
Mathias Avatar asked Mar 22 '13 03:03

Mathias


Video Answer


1 Answers

Disclaimer: I'm maintaining Math.NET Numerics.

The primary value a toolkit like Math.NET Numerics tries to offer is developer productivity, especially for those without a PhD on the subject who would have a hard time or waste a lot of time implementing these sometimes quite involved algorithms themselves, possibly badly - instead of spending the time on their actual problem.

Then, there is some chance that the functionality you need has already been used by others before. Some of them may have already discovered and pointed out some issues and contributed their improvements back. More users helps improving code quality and robustness. Unfortunately this also brings us to the major drawback: It also tends to make code more general, which often makes it less efficient than a highly specialized implementation doing exactly what you need.

This is all along the lines of Cody Gray's comment: Use it if it works and is fast enough, else either help fix it and make it work (and fast), choose another toolkit that works, or implement exactly what you need yourself. Luckily for Math.NET Numerics there are some more options, see below.

As such, I agree with your conclusion: if you don't actually need any complicated operations, don't work with very large data but performance is important, there's nothing wrong with using arrays or another data structure directly (especially in F# where I personally would consider raw native data structure more often than in C#). Of course this comes at the cost of loosing some convenience and the risk that when your start needing more operations after all you may end up re-implementing the toolkit. In the end it also depends on how critical this is to your project, and whether you can spend resources and time to maintain your own math code.

Nevertheless, in my own experience it's often an advantage to own the code (so you can make changes, effective immediately) and to keep it simple an focused (so it does exactly what you need it to do and only that).

Specific to Math.NET Numerics

  • A very specific managed implementation can always outperform a general managed implementation. However, our managed implementation should not be magnitudes slower than any managed alternatives. After all, our algorithms operate directly on arrays as well internally (if properly optimized). If an alternative algorithm is much faster, it seems we better replace our implementation with that alternative, so please do let us know about it (or even better contribute the changes).
  • If you happen to hit a path where we can and do leverage a native provider like MKL and you deal with large data, I'd expect Math.NET to be magnitudes faster, despite the higher level of abstraction.
  • Not all code paths in Math.NET Numerics are optimized equally yet, or leverage native providers. A lot of work has been put into linear algebra over the last few minor versions so we're getting better, but slowly; there's still a lot of work ahead (especially for sparse types). It's possible that you've hit some hardly optimized path in your case. So I'd actually be very interested in your code sample so we can work on this specific case.

Math.NET Numerics Perf Tips

  • Use a native provider
  • Experiment a bit with the parallelization settings in the Control class (but note that we've realized that the parallelization implementation up to v2.4 was actually quite bad and plan to replace it completely in v2.5. First benchmarks are promising)
  • Try avoiding accessing any At/indexer when implementing your own operations but access the raw array directly instead (see .Storage)
  • A lot of operations allow specifying a result vector/matrix, which sometimes can even be the same as one of the operands (in-place). Avoids creating a new array in every operation and thus lowers memory pressure if you're dealing with very large data. Unfortunately also make code ugly.
like image 56
Christoph Rüegg Avatar answered Oct 16 '22 02:10

Christoph Rüegg