I am assuming a dual-core (2 cores per processors) machine with 2 processors for the questions that follow; so a total of 4 "cores". So some natural questions arose: <ol> <li> Suppose I wrote a simple serial program and built it in, say, Visual Studio.. and ran the same program twice, say, with distinct input data in each run. Would they be running on the same processor? Or distinct processors? How much RAM memory would be assigned to each? Would it be the RAM memory on 1 processor (2 cores) or the total RAM? I believe the two programs would run on distinct processors and should each have RAM memory of 1 processor (2 cores); but I am not 100% certain. Would the behavior be any different on Linux? </li> <li> Now suppose my program was written using a distributed memory parallel interface such as MPI and that I ran it once with 2 processors in the np argument (say). Would the program use both processors (and in effect all 4 cores)? Is this the optimal value for the argument -np? In other words, if I did the same with -np 3 or -np 4; is it correct to assume there would be no added advantage? Again, I think so, but I am not 100% certain. I assume also that I could go higher than 4 (-np 5, -np 6, etc). In such cases, how do the processes compete for memory at values of np > 4? Would the performance get worse for np > 4. I think yes, and perhaps this partly depends on problem size, but again not 100% sure. Next, suppose I ran two instances of my MPI-built parallel program, both with -np 2, each with, say, different input data. First off, is this possible? I assume it is and that they each run on both processors? How are the two programs synchronized and how do they individually compete for memory sequentially? This should atleast in part, be based on the order of launching the programs, presumably? </li> <li> Lastly, suppose my program was written using a shared memory parallel interface such as OpenMP and that I ran it once. How many "threads" can I run it on to make full use of shared memory parallelism - is it 2 or 4? (since I have 2 processors with 2 cores each). My guess is it is 4; since all 4 cores are part of the a single shared memory unit? Is that correct? If the answer is 4; does it make sense to run on greater than 4 threads? I am not sure this even works (unlike MPI, where I believe we can do -np 5, -np 6 and so on). </li> </ol> Finally, suppose I run 2 instances of the shared memory parallel program, each with, say, different input data. I assume this is possible and that the individual processes would somehow compete for memory, presumably in the order the programs were launched?

Which processor they run on is entirely up to the OS and depends on many factors, including whatever else is happening on the same machine. The common case, though, is that they will tend to sit on one core each, occasionally swapping to different cores ("occasionally" may mean several times a second or even more frequently). Çores don't have their own RAM on normal PC hardware, and the processes will be given however much RAM they ask for. For MPI processes, yes, your parallelism should match the core count (assuming a CPU-heavy workload). If two MPI processes run with -np 2, they will simply consume all four cores. Increase anything and they'll start to contend. As explained above, RAM has nothing to do with any of this, though cache will suffer in the presence of contention. This "question" is way too long, so I'm going to stop now.

Multiple instances of program on multi-core machine

Tags:

c++

linux

visual-studio-2008

mpi

openmp

I am assuming a dual-core (2 cores per processors) machine with 2 processors for the questions that follow; so a total of 4 "cores". So some natural questions arose:

Suppose I wrote a simple serial program and built it in, say, Visual Studio.. and ran the same program twice, say, with distinct input data in each run. Would they be running on the same processor? Or distinct processors? How much RAM memory would be assigned to each? Would it be the RAM memory on 1 processor (2 cores) or the total RAM? I believe the two programs would run on distinct processors and should each have RAM memory of 1 processor (2 cores); but I am not 100% certain. Would the behavior be any different on Linux?
Now suppose my program was written using a distributed memory parallel interface such as MPI and that I ran it once with 2 processors in the np argument (say). Would the program use both processors (and in effect all 4 cores)? Is this the optimal value for the argument -np? In other words, if I did the same with -np 3 or -np 4; is it correct to assume there would be no added advantage? Again, I think so, but I am not 100% certain. I assume also that I could go higher than 4 (-np 5, -np 6, etc). In such cases, how do the processes compete for memory at values of np > 4? Would the performance get worse for np > 4. I think yes, and perhaps this partly depends on problem size, but again not 100% sure.

Next, suppose I ran two instances of my MPI-built parallel program, both with -np 2, each with, say, different input data. First off, is this possible? I assume it is and that they each run on both processors? How are the two programs synchronized and how do they individually compete for memory sequentially? This should atleast in part, be based on the order of launching the programs, presumably?
Lastly, suppose my program was written using a shared memory parallel interface such as OpenMP and that I ran it once. How many "threads" can I run it on to make full use of shared memory parallelism - is it 2 or 4? (since I have 2 processors with 2 cores each). My guess is it is 4; since all 4 cores are part of the a single shared memory unit? Is that correct? If the answer is 4; does it make sense to run on greater than 4 threads? I am not sure this even works (unlike MPI, where I believe we can do -np 5, -np 6 and so on).

Finally, suppose I run 2 instances of the shared memory parallel program, each with, say, different input data. I assume this is possible and that the individual processes would somehow compete for memory, presumably in the order the programs were launched?

644

asked Aug 08 '11 23:08

squashed.bugaboo

2 Answers

Which processor they run on is entirely up to the OS and depends on many factors, including whatever else is happening on the same machine. The common case, though, is that they will tend to sit on one core each, occasionally swapping to different cores ("occasionally" may mean several times a second or even more frequently).

Çores don't have their own RAM on normal PC hardware, and the processes will be given however much RAM they ask for.

For MPI processes, yes, your parallelism should match the core count (assuming a CPU-heavy workload). If two MPI processes run with -np 2, they will simply consume all four cores. Increase anything and they'll start to contend. As explained above, RAM has nothing to do with any of this, though cache will suffer in the presence of contention.

This "question" is way too long, so I'm going to stop now.

answered Oct 06 '22 00:10

Marcelo Cantos

@Marcelo is absolutely right and I'd like to just expand on his answer a little bit.

The OS will determine where and when the threads the comprise the application execution depending on what else is going on in the system and the available resources. Each application will run in it's own process and that process can have hundereds or thousands of threads. The OS (Windows, Linux, Mac whatever) will switch the execution context of the processing cores to ensure that all applications and services get a slice of the pie.

As for I/O access to such things as RAM that is physically controlled by the NorthBridge Controller that sits on your motherboard. Each process (not processor!) will have an allocated amount of RAM that it can deal with that can expand or contract over the lifetime of the application... this of course is limited to the amount of resources available on the system, and also worth noting the OS will take care of swapping RAM requests beyond it's physically availability to disk (i.e. Virtual RAM). On the other hand though you will need to coordinate access to memory within your application through the use of critical sections and other thread synchronising mechanisms.

OpenMP is a library that helps you write multithreaded parellel applications and makes the syntax of keeping threads in sync easier.... I would comment more, but it's been quite a while since I've used it and I'm sure someone could give a better explaination.

answered Oct 05 '22 22:10

Paul Carroll

Related questions
                            
                                What is the advantage of using vector<char> as input buffer over char array?
                            
                                How/What to return in this case?
                            
                                Problem with ostream/ofstream inheritance
                            
                                The efficiency of using a pthread_rwlock when there are a lot of readers
                            
                                Torrent Library for C++, Windows [closed]
                            
                                How to catch run time error in C and C++?
                            
                                C++ importing and using ADO
                            
                                Casting from unknown pointer to class; how to check validity?
                            
                                c++ convert from LPCTSTR to const char *
                            
                                printf using stack? [duplicate]
                            
                                C++: vector memory corruption when modifiying an object member from outside the copy constructor but not when modifying from within
                            
                                How do I debug a native code project from inside a managed code project? C++/C#
                            
                                What does double not (!!) used on a non-boolean variable do? [duplicate]
                            
                                what does default constructor do when it's empty?
                            
                                QStringList remove spaces from strings
                            
                                Accessing outside of class's namespace inside class method?
                            
                                Conditional variable declaration
                            
                                C++ socket programming: maximize throughput/bandwidth on localhost (I only get 3 Gbit/s instead of 23GBit/s)
                            
                                Any way to speed up/reduce CPU usage when drawing with Cairo?
                            
                                C++ refuses to run properly unless cout is used

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With