Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the common causes for high CPU usage?

Background:

In my application written in C++, I have created 3 threads:

  • AnalysisThread (or Producer) : it reads an input file, parses it, and generates patterns, and enqueue them into std::queue1.
  • PatternIdRequestThread (or Consumer) : it deque patterns from the queue, and sends them, one by one, to database through a client (written in C++), which returns pattern uid which is then assigned to the corresponding pattern.
  • ResultPersistenceThread : it does few more things, talks to database, and it works fine as expected, as far as CPU usage is concerned.

First two threads take 60-80% of CPU usage, each takes 35% on average.

Question:

I don't understand why some threads take high CPU usage.

I analyse it as follows : if it is the OS who makes decisions like context-switch, interrupt, and scheduling as to which thread should be given access to system resources, such as CPU time, then how come some threads in a process happen to use more CPU than the others? It looks like some threads forcefully takes CPU from the OS at gunpoint, or the OS has a real soft spot for some threads and so it is biased towards them from the very beginning, giving them all the resources it has. Why can't it be impartial and give them all equally?

I know that it is naive. But I get confused more if I think along this line : the OS gives access to CPU to a thread, based on the amount of work to be done by the thread, but how does the OS compute or predict the amount of work before executing it completely?

I wonder what are the causes for high CPU usage? How can we identify them? Is it possible to identify them just by looking at the code? What are the tools?

I'm using Visual Studio 2010.

1. I've my doubt about std::queue as well. I know that standard containers aren't thread safe. But if exactly one thread enqueue items to queue, then is it safe if exactly one thread deque items from it? I imagine it be like a pipe, on one side you insert data, on the other, you remove data, then why would it be unsafe if its done simultenously? But that is not the real question in this topic, however, you can add a note in your answer, addressing this.

Updates:

After I realized that my consumer-thread was using busy-spin which I've fixed with Sleep for 3 seconds. This fix is temporary, and soon I will use Event instead. But even with Sleep, the CPU usage has dropped down to 30-40%, and occasionally it goes up to 50% which doesn't seem to be desirable from the usability point of view, as the system doesn't respond to other applications which the user is currently working with.

Is there any way that I can still improve on the high CPU usage? As said earlier, the producer thread (which now uses most of the CPU cycles) reads a file, parses packets (of some format) in it, and generates patterns out of them. If I use sleep, then the CPU usage would decrease but would it be a good idea? What are the common ways to solve it?

like image 367
Nawaz Avatar asked Feb 14 '12 10:02

Nawaz


People also ask

Why my CPU usage is high even when nothing is running?

Your CPU usage can spike to nearly 100% out of nowhere. This can be caused by Task Manager glitches, background processes, malware, and even your antivirus software. The best way to fix these issues is to go through the programs in Task Manager and investigate which are using too much CPU power.


2 Answers

Personally I'd be pretty annoyed if my threads had work to do, and there were idle cores on my machine because the OS wasn't giving them high CPU usage. So I don't really see that there's any a problem here [Edit: turns out your busy looping is a problem, but in principle there's nothing wrong with high CPU usage].

The OS/scheduler pretty much doesn't predict the amount of work a thread will do. A thread is (to over-simplify) in one of three states:

  1. blocked waiting for something (sleep, a mutex, I/O, etc)
  2. runnable, but not currently running because other things are
  3. running.

The scheduler will select as many things to run as it has cores (or hyperthreads, whatever), and run each one either until it blocks or until an arbitrary period of time called a "timeslice" expires. Then it will schedule something else if it can.

So, if a thread spends most of its time in computation rather than blocking, and if there's a core free, then it will occupy a lot of CPU time.

There's a lot of detail in how the scheduler chooses what to run, based on things like priority. But the basic idea is that a thread with a lot to do doesn't need to be predicted as compute-heavy, it will just always be available whenever something needs scheduling, and hence will tend to get scheduled.

For your example loop, your code doesn't actually do anything, so you'd need to check how it has been optimized before judging whether 5-7% CPU makes sense. Ideally, on a two-core machine a processing-heavy thread should occupy 50% CPU. On a 4 core machine, 25%. So unless you have at least 16 cores then your result is at first sight anomalous (and if you had 16 cores, then one thread occupying 35% would be even more anomalous!). In a standard desktop OS most cores are idle most of the time, so the higher the proportion of CPU that your actual programs occupy when they run, the better.

On my machine I frequently hit one core's worth of CPU use when I run code that is mostly parsing text.

if exactly one thread enqueue items to queue, then is it safe if exactly one thread deque items from it?

No, that is not safe for std::queue with a standard container. std::queue is a thin wrapper on top of a sequence container (vector, deque or list), it doesn't add any thread-safety. The thread that adds items and the thread that removes items modify some data in common, for example the size field of the underlying container. You need either some synchronization, or else a safe lock-free queue structure that relies on atomic access to the common data. std::queue has neither.

like image 200
Steve Jessop Avatar answered Oct 14 '22 15:10

Steve Jessop


Edit: Ok, since you are using busy spin to block on the queue, this is most likely the cause for high CPU usage. The OS is under the impression that your threads are doing useful work when they are actually not, so they get full CPU time. There was interesting discussion here: Which one is better for performance to check another threads boolean in java

I advise you to either switch to events or other blocking mechanisms or use some synchronized queue instead and see how it goes.

Also, that reasoning about the queue being thread-safe "because only two threads are using it" is very dangerous.

Assuming the queue is implemented as a linked list, imagine what can happen if it has only one or two elements remaining. Since you have no way of controlling the relative speeds of the producer and the consumer, this may well be the case and so you're in big trouble.

like image 40
Tudor Avatar answered Oct 14 '22 17:10

Tudor