I am currently exploring some aspects of unified parallel C as an alternative to standard parallelization approaches in HPC (like MPI, OpenMP, or hydrid approaches).
My question is: Does anyone have experience in UPC performance on large scale applications (~>10.000 cores)? I am mainly interested in the access speed of shared memory. Obviously this depends on the underlying hardware, network connection, operating system, compilers etc. But I am generally interested in any kind of "real-world" problem solving with UPC.
Furthermore, what is your general impression of UPC? Do you think it has the potential for the future to be more widely used than now? Is it worth switching to it?
Any comments are welcome!
Thanks a lot, Mark
There are pros and cons either way.
The advantages of UPC are that it is likely easier to get something working, and with decent performance, than MPI or MPI+OpenMP. And because the (say) Berkeley UPC compiler is open source, you should be able to compile your program 5 years from now regardless. On top of that, supporting languages like UPC was a requirement for IBM to win the Blue Waters contract, so there should be a professionally-maintained UPC compiler out there for at least the life of that system, which should help the UPC ecosystem remain active.
I personally haven't written anything really big (in terms of code size, or in terms of scaling to >1k procs) in UPC, but in the worst case you could run it using the MPI runtime, and it should scale like the corresponding MPI code. On smaller problems, there's lots of antecdotal evidence that the performance of codes written in UPC (and other PGAS languages) is certainly competitive with, and sometimes better than, the MPI program written in a similar way, and the reasons for that are fairly well understood.
The downsides are that, because it's new, the tool support is not as strong. There are many quite sophisticated tools out there, free and commercial, for performance tuning of a large scale MPI application, whereas the PGAS/GASnet/UPC tools are more research-grade, in the bad way. IBM is likely working on stuff for Blue Waters, but unless you're running on a P7 system, that may not help you in particular. Similarly, parallel I/O libraries/tools don't seem to really exist in UPC in any solid form.
In addition, with a new lanaguage, there's always a worry about how active it will remain N years from now. The compilers should work, but will new runtimes continue to be developed and improved for new architectures? Note that this has always been the catch-22 for new scientific programming languages. Scientific developers tend to be very conservative, wanting to know that what they're working on will continue to work (and work well) 10+ years into the future, so they tend to be skeptical of the longevity of new languages -- and that turns into a self-fullfilling prophesy, as people stay away from the new languages so they languish and become abandonware.
I don't think that's a huge worry with UPC, because I think there's enough institutional support behind these PGAS languages that they'll be around for a while. Coarray Fortran is part of the 2008 standard, so compiler vendors will have to support PGAS-like runtimes regardless. DARPA, etc, is strongly behind the PGAS-y languages or things like X10/Chapel. So I think these languages will be more likely to get a fair shot at success, and I think 5-10 years out your code will still compile and run at least passably well.
I am curious about the software architecture issues around UPC; I don't know if the new shared arrays end up being good or bad for developing really large pieces of software. Something like coarray fortran, which is less ambitious, it's a little easier to see how that plays out in a big package.
So after all those paragraphs, I'm afraid the answer is "it depends", and it's probably going to come down to your personal style and risk tolerance. If you like being on the first-adopter, leading-edge of things, with all the advantages (be the first to take advantage of new, high-productivity tools, leapfrogging others, being an expert in new stuff) and disadvantages (lack of strong tool support, higher degree of risk, fewer books to turn to, etc) that implies, I think UPC is likely a pretty solid choice. The basic programming model is going to be around for a good while, and this language in particular has a good amount of backing. On the other hand, if you prefer to "play it safe" and do the MPI+OpenMP approach, that's going to be a pretty defensible choice, too. But in the end, we need some developers to try these new languages for real projects, or we as a community are going to be stuck with C/Fortran+MPI+OpenMP forever.
Hard to top Jonathan Dursi's answer, but I did want to add that your choice does not have to be either/or. You can have both. Jim Dinan at Argonne National Laboratory has demonstrated good results using MPI as the "off-node" messaging method and UPC for the on-node (shared memory) pieces.
See "Hybrid Parallel Programming with MPI and Unified Parallel C" James Dinan, Pavan Balaji, Ewing Lusk, P. Sadayappan, Rajeev Thakur. Proc. 7th ACM Conf. on Computing Frontiers (CF). Bertinoro, Italy. May 17-19, 2010.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With