Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which C++ libraries should I use for a large parallel computing number-crunching project exploiting third-party applications

Introduction

I want to request a lot of advice on a new programming project I am going to start on my own. I am going to be very precise in what I would like to accomplish and in what my basic requirements are. Therefore this is going to be a long question. Please bear with me.

I am going to split the question into five sections:

  1. Real-world problem
  2. Simulation problem
  3. Requirements and preferences
  4. Additional information
  5. List of advice requests

1. Real-world problem

Skyscrapers and large bridges suffer from dynamic wind loading. This means, when designed incorrectly they can collapse due to wind-induced vibrations (this actually happened in 1940: http://www.youtube.com/watch?v=3mclp9QmCGs). To design such structures correctly, efficient number-crunching software is required for analysis and simulation.

2. Simulation problem

There exists a multitude of software capable of either simulating fluid flows or structural mechanics. Many have already been developed for over 30 years and are proven and mature technologies. Writing a multi-physical program capable of simulating both fluid flows and structural mechanics simultaneously from scratch, is therefore unwise. First of all, you would need years of development before reaching maturity and it is very hard to enter a world which has depended on specific software for over 30 years. But more important...why recreate when you can reuse? Instead of pursuing a monolithic approach, I prefer a partitioned approach where I can reuse existing simulation software.

In the partitioned approach I will use software X to simulate flows and I will use software Y to simulate structures. Then I will write my own coupling algorithm which establishes communication between X and Y and uses them to simulate the multi-physical problem (e.g. wind-induced vibrations of skyscrapers or bridges). The reason I use X and Y and not actual software names is because X and Y are supposed to be black-boxes. In no way is my coupling algorithm to be dependent on the implementation of X and Y. The algorithm will only depend on the output of X and Y. This way an end-user can select which ever X or Y is available to them or which ever X or Y is capable of doing what the end-user wants to achieve.

Because I want to use a black-box partitioned approach, software X knows nothing of Y and vice-versa. But how do I simulate the deformation of a bridge without knowing anything of the surrounding air flow and how to I know in which way the surrounding air flow is perturbed by the structure without knowing anything about its deformation? The answer is simple: start with a guess and use an iterative approach to converge to the correct solution. This approach is however very computationally expensive. To reduce the computational cost the coupling algorithm can be written in a clever way using very efficient technologies, not to be discussed here. All I would like to say is that some heavy linear algebra number-crunching is required.

3. Requirements and preferences

What I need to do is:

  • establish communication between third-party open-source or proprietary software
  • perform some heavy number-crunching (linear algebra)
  • visualise results (2D / 3D plotting and animating)
  • deliver an interactive analysis and development environment
  • create an intuitive graphical user interface

What I want my software to be:

  • open-source
  • cross-platform
  • extendable through scripts and/or shared libraries

What I am going to use:

  • C++ for heavy number-crunching
  • CPython for programming logic
  • NumPy / SciPy for some number-crunching in CPython
  • Matplotlib for results visualisation in CPython

4. Additional information

Facts:

  • one-man project at start, grow to company if successful
  • primary OS is a KDE-based Linux distribution

Business model:

  • Free software and basic documentation.
  • Paid services and elaborate documentation.

5. List of advice requests

I want to do all number-crunching in C++ by writing many functions which individually perform just a tiny task. The program logic is to be contained in a CPython package which executes the entire simulation while relying on the C++ functions to perform the number-crunching. The C++ / CPython algorithm is to be extended with scripts written in CPython (using NumPy, SciPy, SymPy and Matplotlib) to generate and visualise results from raw numerical data. I want to be able to do parallel computing and I need to communicate with several third party open-source AND proprietary software.

To accomplish all that I am going to need a whole bunch of existing libraries/packages/technologies etc. And to all relevant issues I know what I can use, however I do not know what I should use. The best solution is as always to try everything out and see what works best. However if any experienced user can weed out some of the more unlikely candidates I would gladly receive his or hers advice, suggestion, pro / con list on:

  1. Glueing C++ and CPython (e.g. CTypes, SIP, SWIG etc.)
  2. C++ linear algebra number-crunching library (e.g. Armadillo, Eigen, PETSc etc.)
  3. Graphical interface development library (e.g. Qt, GTK, wxWidgets etc.)
  4. Software communication and parallel computing (e.g. MPICH, OpenMPI, OpenMP etc.)
  5. CPython 2.7.x or CPython 3.x

NOTE: I have summed some options above, but these are only exemplary and not a limitation to. I am open to everything as long as it is written in C, C++, Fortran or Python. Also I do not expect an answer in all five categories enlisted above from one individual. Let the collective knowledge of the community take care of that.

I thank all contributors and wish you all the best of luck in your own endeavors.

like image 290
Aeronaelius Avatar asked May 18 '13 20:05

Aeronaelius


2 Answers

You mention parallelism but not how you intend to make this project parallel. This is a much more complex issue than simply choosing a couple libraries. There are several major considerations required before moving forward.

You mention the intended platform briefly, but you also have to consider whether the simulation will be run on a single computer/node or multiple. Considering that you are doing an iterative simulation of a building, you are probably going to require far more compute power than any one computer can provide. This means that, unless you want to go with a hybrid multiple-process, multiple-thread approach, you are limited to a multiple-process model of parallelism. OpenCL and MPI are then each options for your implementation. (note: MPICH and OpenMPI are each just implementations of MPI, and your code should be agnostic of these) Message passing with MPI is a good general model of parallelism however it can be quite difficult for those not used to working with parallel code. My personal experience is with MPI and some hybrid programming so I cannot say much else regarding your choice of parallel model.

A problem that follows from the issue of the parallel model is that it directly impacts the simulation software. I am not entirely certain how separate you are planning on keeping your algorithm from the simulations. If you plan to have your code fork separate process to run the simulations, you will have issues with cross-platform support as you may not have the luxury of being able to run arbitrary simulations in this manner. If you instead intend to run the simulations within your software, the parallel model has to be consistent throughout. Although this puts limitations on the black box strategy, it may make the entire thing that much more feasible.

A good deal has already been said about applicable libraries. I don't have any more to say about specific libraries that hasn't already been said. Just keep in mind that many of the same issues have to be addressed with these as when running the simulations.

TLDR: Parallelism should not be looked over. You need to know which parallel model that you will be using before making decisions on libraries.

like image 75
corahm Avatar answered Oct 26 '22 00:10

corahm


Graphical interface development library (e.g. Qt, GTK, wxWidgets etc.)
If your "primary OS is a KDE-based Linux distribution", then QT wins this one hands down.
Logic behind this:-

  • KDE is writen in QT. A QT app running in KDE is an eagle in the air! It's in its element. With KDE Being target number 1, your QT GUI will most likely work out of the box without users having to download additional gui libs. Your GUI will also look super native.
  • QT is the most portable of the three. (you mentioned "primary OS", hinting that other platforms might follow). Therefore with qt you can port to Windows, OSX, GNOME, Embeded Linux, Android, Symbian, HAIKU, Solaris ..etc
  • QT has arguably the best RAD tools of the top three cross platform GUI libs (IMHO). Think QTCreator vs wxSmith vs Anjuta/Glade.
  • wxWidgets on linux is basically a wapper for GTK+ (v1 to v3) + additional helpers. Tho I preffer it over GTK. It also wraps around X11 and motif but trust me chief, you will not like those ports.
  • wxWidgets portability is not as seamless as one would think. Each port is a totaly different implementation, each wrapping totally different backends! I once ported a small app that uses wxDataViewCtrl with a custom tree model. SIGSEGVs became the order of the day. So I eventually decided to go with the generic wxDataViewCtrl (that looks funny in GTK+3). I still like wxWidgets tho.

NB: Consider also using the latest web technologies for the C&V part of the MVC (model view controller).
HTML5+CSS3+JS can be run by a web-view widget on a desktop app. All the above 3 GUI Libs sport this control (for wx, it's wx2.9.3 and above).
Web technologies:-

  • Have (arguably) the fastest time-to-market of any GUI lib.
  • Have (arguably) the most available and affordable developers of any GUI technology today.
  • Produce the most stunning UIs of any GUI lib.
  • Produce the least "RIGID" UIs of any GUI lib....you can rotate a gui element e.g. a html table around any axis with fancy animation just by mouse over, without any programming overhead!..no js..no c++..nada!

CPython 2.7.x or CPython 3.x
- CPython might not be well suited to your project's requirements (I think) chiefly because of the mutex monster that is the Global Interpreter Lock (GIL) bottleneck.
Maybe PyPy would be a better python implementation for your project?

By the way, have you also considered:- Javascript on V8 vs. Python (PyPy,CPython et. all)? Javascript run by V8 can interact with Native Code (c++) sort of like Ctypes with python

I also came across this interesting blog (JS on V8 vs Py).

like image 36
mrmoje Avatar answered Oct 26 '22 01:10

mrmoje