Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel linear algebra for multicore system [closed]

I'm developing a program that needs to do heavy linear algebra calculations.

Now I'm using LAPACK/BLAS routines, but I need to exploit my machine (24 core Xeon X5690).

I've found projects like pblas and scalapack, but they all seem to focus on distributed computing and on using MPI.

I have no cluster available, all computations will be done on a single server and using MPI looks like an overkill.

Does anyone have any suggestion on this?

like image 620
Patrik Avatar asked Apr 05 '12 09:04

Patrik


2 Answers

As mentioned by @larsmans (with, say, MKL), you still use LAPACK + BLAS interfaces, but you just find a tuned, multithreaded version for your platform. MKL is great, but expensive. Other, open-source, options include:

  • OpenBLAS / GotoBLAS, the Nehalem support should work ok but no tuned support yet for westmere. Does multithreading very well.
  • Atlas : automatically tunes to your architecture at installation time. probably slower for "typical" matricies (eg, square SGEMM) but can be faster for odd cases, and for westmere may even beat out OpenBLAS/GotoBLAS, haven't tested this myself. Mostly optimized for serial case, but does include parallel multithreading routines.
  • Plasma - LAPACK implementation designed specificially for multicore.

I'd also agree with Mark's comment; depending on what LAPACK routines you're using, the distributed memory stuff with MPI might actually be faster than the multithreaded. That's unlikely to be the case with BLAS routines, but for something more complicated (say the eigenvalue/vector routines in LAPACK) it's worth testing. While it's true that MPI function calls are an overhead, doing things in a distributed-memory mode means you don't have to worry so much about false sharing, synchronizing access to shared variables, etc.

like image 89
Jonathan Dursi Avatar answered Sep 28 '22 23:09

Jonathan Dursi


Consider using Intel MKL. OpenBLAS can also be quite fast, though I haven't run it on > quadcore machines yet.

like image 41
Fred Foo Avatar answered Sep 28 '22 22:09

Fred Foo