Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread Affinity

I have a multithreaded program which consist of a C# interop layer over C++ code. I am setting threads affinity (like in this post) and it works on part of my code, however on second part it doesn't work. Can Intel Compiler / IPP / MKL libs / inline assembly interfere with external affinity setting?

UPDATE: I can't post code as it is whole environment with many many dlls. I set environment values: OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 IPP_NUM_THREADS=1. When it runs in single thread, it runs ok, but when I use number of C# threads and set affinity per thread (on a quad core machine), the initialization is going fine on separate cores, but during processing all threads start using the same core. Hope I am clear enough.

Thanks.

like image 429
pavelk Avatar asked Oct 11 '13 01:10

pavelk


1 Answers

We've had this exact problem; we'd set our thread affinity to what we wanted, and the IPP/MKL functions would blow that away! The answer to your question is 'yes'.

Auto Parallelism

The issue is that, by default, the Intel libraries like to automatically execute multi-threaded versions of the routines. So, a single FFT gets computed by a number of threads setup by the library specifically for this purpose.

Intel's intent is that the programmer could get on with the job of writing a single threaded application, and the library would allow that single thread to benefit from a multicore processor by creating a number of threads for the maths work. A noble intent (your source code then need know nothing about the runtime hardware to extract the best achievable performance - handy sometimes), but a right bloody nuisance when one is doing one's own threading for one's own reasons.

Controlling the Library's Behaviour

Take a look at these Intel docs, section Support Functions / Threading Support Functions. You can either programmatically control the library's threading tendancies, or there's some environment variables you can set (like MKL_NUM_THREADS) before your program runs. Setting the number of threads was (as far as I recall) enough to stop the library doing its own thing.

Philosophical Essay Inspired By Answering Your Question (best ignored)

More or less everything Intel is doing in CPU design and software (e.g. IPP/MKL) is aimed at making it unnecessary for the programmer to Worry About Threads. You want good math performance? Use MKL. You want that for loop to go fast? Turn on Auto Parallelisation in ICC. You want to make the best use of cache? That's what Hyperthreading is for.

It's not a bad approach, and personally speaking I think that they've done a pretty good job. AMD too. Their architectures are quite good at delivering good real world performance improvements to the "Average Programmer" for the minimal investment in learning, re-writing and code development.

Irritation

However, the thing that irritates me a little bit (though I don't want to appear ungrateful!) is that whilst this approach works for the majority of programmers out there (which is where the profitable market is), it just throws more obstacles in the way of those programmers who want to spin their own parallelism. I can't blame Intel for that of course, they've done exactly the right thing; they're a market led company, they need to make things that will sell.

By offering these easy features the situation of there being too many under skilled and under trained programmers becomes more entrenched. If all programmers can get good performance without having to learn what auto parallelism is actually doing, then we'll never move on. The pool of really good programmers who actually know that stuff will remain really small.

Problem

I see this as a problem (though only a small one, I'll explain later). Computing needs to become more efficient for both economic and environmental reasons. Intel's approach allows for increased performance, and better silicon manufacturing techniques produces lower power consumption, but I always feel like it's not quite as efficient as it could be.

Example

Take the Cell processor at the heart of the PS3. It's something that I like to witter on about endlessly! However, IBM developed that with a completely different philosophy to Intel. They gave you no cache (just some fast static RAM instead to use as you saw fit), the architecture was pretty much pure NUMA, you had to do all your own parallelisation, etc. etc. The result was that if you really knew what you were doing you could get about 250GFLOPS out of the thing (I think the none-PS3 variants went to 320GLOPS), for 80Watts, all the way back in 2005.

It's taken Intel chips about another 6 or 7 years or so for a single device to get to that level of performance. That's a lot of Moores law growth. If the Cell got manufactured on Intel's latest silicon fab and was given as many transistors as Intel put into their big Xeons, it would still blow everything else away.

No Market

However, apart from PS3, Cell was a none-starter market proposition. IBM decided that it would never be a big enough seller to be worth their while. There just wasn't enough programmers out there who could really use it, and to indulge the few of us who could makes no commercial sense, which wouldn't please the shareholders.

Small Problem, Bigger Problem

I said earlier that this was only a small problem. Well, most of the world's computing isn't about high maths performance, it's become Facebook, Twitter, etc. That sort is all about I/O performance, and for that you don't need high maths performance. So in that sense the dependence on Intel Doing Everything For You so that the average programmer to get good math performance matters very little. There's just not enough maths being done to warrant a change in design philosophy.

In fact, I strongly suspect that the world will ultimately decide that you don't need a large chip at all, an ARM should do just fine. If that does come to pass then the market for Intel's very large chips with very good general purpose maths compute performance will vanish. Effectively those of use who want good maths performance are being heavily subsidised by those who want to fill enourmous data centres with Intel based hardware and put Intel PCs on every desktop.

We're simply lucky that Intel apparently has a desire to make sure that every big CPU they build is good at maths regardless of whether most of their users actually use that maths performance. I'm sure that desire has its foundations in marketing prowess and wanting the bragging rights, but those are not hard, commercially tangible artifacts that bring shareholder value.

So if those data centre guys decide that, actually, they'd rather save electricity and fill their data centres with ARMs, where does that leave Intel? ARMs are fine devices for the purpose for which they're intended, but they're not at the top of my list of Supercomputer chips. So where does that leave us?

Trend

My take on the current market trend is that 'Workstations' (PCs as we call them now) are going to start costing lots and lots of money, just like they did in the 1980s / early 90s.

I think that better supercomputers will become unaffordable because no one can spare the $10billions it would take to do the next big chip. If people stop having PCs there won't be a mass market for large all-out GPUs, so we won't even be able to use those instead. They're an exclusive thing, but super computers do play a vital role in our world and we do need them to get better. So who is going to pay for that? Not me, that's for sure.

Oops, that went on for quite a while...

like image 102
bazza Avatar answered Nov 12 '22 09:11

bazza