Large Matrix Inversion

Tags:

I am looking at taking the inverse of a large matrix, common size of 1000 x 1000, but sometimes exceeds 100000 x 100000 (which is currently failing due to time and memory). I know that the normal sentiment is 'don't take the inverse, find some other way to do it', but that is not possible at the moment. The reason for this is due to the usage of software that is already made that expects to get the matrix inverse. (Note: I am looking into ways of changing this, but that will take a long time)

At the moment we are using an LU decomposition method from numerical recopies, and I am currently in the process of testing the eigen library. The eigen library seems to be more stable and a bit faster, but I am still in testing phase for accuracy. I have taken a quick look at other libraries such as ATLAS and LAPACK but have not done any substantial testing with these yet.

It seems as though the eigen library does not use concurrent methods to compute the inverse (though does for LU factorization part of the inverse) and as far as I can tell ATLAS and LAPACK are similar in this limitation. (I am currently testing the speed difference for eigen with openMP and without.)

First question is can anyone explain how it would be possible to optimize matrix inversion by parallelization. I found an article here that talks about matrix inversion parallel algorithms, but I did not understand. It seems this article talks about another method? I am also not sure if scaLAPACK or PETSc are useful?

Second question, I read this article of using the GPUs to increase performance, but I have never coded for GPUs and so have no idea what is trying to convey, but the charts at the bottom looked rather alarming. How is this even possible, and how where do I start to go about implementing something like this if it is to be true.

I also found this article, have yet had the time to read through it to understand, but it seems promising, as memory is a current issue with our software.

Any information about these articles or the problems in general would be of great help. And again I apologize if this question seems vague, I will try to expand more if necessary.

304

asked Jun 27 '12 15:06

Onekuo

4 Answers

First question is can anyone explain how it would be possible to optimize matrix inversion by parallelization.

I'd hazard a guess that this, and related topics in linear algebra, is one of the most studied topics in parallel computing. If you're stuck looking for somewhere to start reading, well good old Golub and Van Loan have a chapter on the topic. As to whether Scalapack and Petsc are likely to be useful, certainly the former, probably the latter. Of course, they both depend on MPI but that's kind of taken for granted in this field.

Second question ...

Use GPUs if you've got them and you can afford to translate your code into the programming model supported by your GPUs. If you've never coded for GPUs and have access to a cluster of commodity-type CPUs you'll get up to speed quicker by using the cluster than by wrestling with a novel technology.

As for the last article you refer to, it's now 10 years old in a field that changes very quickly (try finding a 10-year old research paper on using GPUs for matrix inversion). I can't comment on its excellence or other attributes, but the problem sizes you mention seem to me to be well within the capabilities of modern clusters for in-core (to use an old term) computation. If your matrices are very big, are they also sparse ?

Finally, I strongly support your apparent intention to use existing off-the-shelf codes rather than to try to develop your own.

answered Sep 30 '22 04:09

High Performance Mark

100000 x 100000 is 80GB at double precision. You need a library that supports memory-mapped matrices on disk. I can't recommend a particular library and I didn't find anything with quick Google searches. But code from Numerical Recipes certainly isn't going to be adequate.

answered Sep 30 '22 04:09

japreiss

Regarding the first question (how to parallellize computing the inverse):

I assume you are computing the inverse by doing an LU decomposition of your matrix and then using the decomposition to solve A*B = I where A is your original matrix, B is the matrix you solve for, and I is the identity matrix. Then B is the inverse.

The last step is easy to parallellize. Divide your identity matrix along the columns. If you have p CPUs and your matrix is n-by-n, then every part has n/p columns and n rows. Lets call the parts I1, I2, etc. On every CPU, solve a system of the form A*B1 = I1, this gives you the parts B1, B2, etc., and you can combine them to form B which is the inverse.

answered Sep 30 '22 04:09

Jitse Niesen

An LU decomp on a GPU can be ~10x faster than on a CPU. Although this is now changing, GPU's have traditionally been designed around single precision arithmetic, and so on older hardware single precision arithmetic is generally much faster than double precision arithmetic. Also, storage requirements and performance will be greatly impacted by the structure of your matrices. A sparse 100,000 x 100,000 matrix LU decomp is a reasonable problem to solve and will not require much memory.

Unless you want to become a specialist and spend a lot of time tuning for hardware updates, I would strongly recommend using a commercial library. I would suggest CULA tools. They have both sparse and dense GPU libraries and in fact their free library offers SGETRF - a single precision (dense) LU decomp routine. You'll have to pay for their double precision libraries.

answered Sep 30 '22 04:09

Paul

Related questions
                            
                                Why does Executors.newCachedThreadPool throw java.util.concurrent.RejectedExecutionException during submit
                            
                                Any suggestions for a program or small project to learn about concurrency in Java? [closed]
                            
                                java.util.ConcurrentLinkedQueue
                            
                                Design Patterns for Concurrent Programming? [closed]
                            
                                What Limitation of std::async is Stroustrup Referring To?
                            
                                Confused by jcstress test on ReentrantReadWriteLock#tryLock failing
                            
                                Should I prefer ThreadLocalRandom over ThreadLocal<Random>?
                            
                                java Fork/Join clarification about stack usage
                            
                                combine putIfAbsent and replace with ConcurrentMap
                            
                                Equivalent of Goroutines in Clojure / Java
                            
                                Array allocation and access on the Java Virtual Machine and memory contention
                            
                                Why does a condition variable need a lock (and therefore also a mutex) [duplicate]
                            
                                How should I architect my (mostly) text-based game server?
                            
                                How JVM ensures thread safety of memory allocation for a new object
                            
                                Is the jQuery trigger function guaranteed to be synchronous
                            
                                Is there an equivalent for Guava Striped-Class in C#?
                            
                                Java concurrency pattern for external shared resource (smartcards)
                            
                                JavaScript - When exactly does the call stack become "empty"?
                            
                                Non-Blocking Concurrent Queue with offer and flush
                            
                                Reasonable number of threads for thread pool running web service requests

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Large Matrix Inversion

Tags:

concurrency

parallel-processing

linear-algebra

matrix-inverse

eigen

Onekuo

People also ask

4 Answers

High Performance Mark

japreiss

Jitse Niesen

Paul

Recent Activity

Donate For Us