Understanding Threads + Asynchronous

Tags:

So I have a program that I made that needs to send a lot (like 10,000+) of GET requests to a URL and I need it to be as fast as possible. When I first created the program I just put the connections into a for loop but it was really slow because it would have to wait for each connection to complete before continuing. I wanted to make it faster so I tried using threads and it made it somewhat faster but I am still not satisfied.

I'm guessing the correct way to go about this and making it really fast is using an asynchronous connection and connecting to all of the URLs. Is this the right approach?

Also, I have been trying to understand threads and how they work but I can't seem to get it. The computer I am on has an Intel Core i7-3610QM quad-core processor. According to Intel's website for the specifications for this processor, it has 8 threads. Does this mean I can create 8 threads in a Java application and they will all run concurrently? Any more than 8 and there will be no speed increase?

What exactly does the number represent next to "Threads" in the task manager under the "Performance" tab? Currently, my task manager is showing "Threads" as over 1,000. Why is it this number and how can it even go past 8 if that's all my processor supports? I also noticed that when I tried my program with 500 threads as a test, the number in the task manager increased by 500 but it had the same speed as if I set it to use 8 threads instead. So if the number is increasing according to the number of threads I am using in my Java application, then why is the speed the same?

Also, I have tried doing a small test with threads in Java but the output doesn't make sense to me. Here is my Test class:

import java.text.SimpleDateFormat;
import java.util.Date;

public class Test {

    private static int numThreads = 3;
    private static int numLoops = 100000;
    private static SimpleDateFormat dateFormat = new SimpleDateFormat("[hh:mm:ss] ");

    public static void main(String[] args) throws Exception {

        for (int i=1; i<=numThreads; i++) {
            final int threadNum = i;
            new Thread(new Runnable() {
                public void run() {
                    System.out.println(dateFormat.format(new Date()) + "Start of thread: " + threadNum);
                    for (int i=0; i<numLoops; i++)
                        for (int j=0; j<numLoops; j++);
                    System.out.println(dateFormat.format(new Date()) + "End of thread: " + threadNum);
            }
            }).start();
            Thread.sleep(2000);
        }

    }
}

This produces an output such as:

[09:48:51] Start of thread: 1
[09:48:53] Start of thread: 2
[09:48:55] Start of thread: 3
[09:48:55] End of thread: 3
[09:48:56] End of thread: 1
[09:48:58] End of thread: 2

Why does the third thread start and end right away while the first and second take 5 seconds each? If I add more that 3 threads, the same thing happens for all threads above 2.

Sorry if this was a long read, I had a lot of questions. Thanks in advance.

858

asked Oct 17 '12 02:10

Altherat

2 Answers

Your processor has 8 cores, not threads. This does in fact mean that only 8 things can be running at any given moment. That doesn't mean that you are limited to only 8 threads however.

When a thread is synchronously opening a connection to a URL it will often sleep while it waits for the remote server to get back to it. While that thread is sleeping other threads can be doing work. If you have 500 threads and all 500 are sleeping then you aren't using any of the cores of your CPU.

On the flip side, if you have 500 threads and all 500 threads want to do something then they can't all run at once. To handle this scenario there is a special tool. Processors (or more likely the operating system or some combination of the two) have a scheduler which determines which threads get to be actively running on the processor at any given time. There are many different rules and sometimes random activity that controls how these schedulers work. This may explain why in the above example thread 3 always seems to finish first. Perhaps the scheduler is preferring thread 3 because it was the most recent thread to be scheduled by the main thread, it can be impossible to predict the behavior sometimes.

Now to answer your question regarding performance. If opening a connection never involved a sleep then it wouldn't matter if you were handling things synchronously or asynchronously you would not be able to get any performance gain above 8 threads. In reality, a lot of the time involved in opening a connection is spent sleeping. The difference between asynchronous and synchronous is how to handle that time spent sleeping. Theoretically you should be able to get nearly equal performance between the two.

With a multi-threaded model you simply create more threads than there are cores. When the threads hit a sleep they let the other threads do work. This can sometimes be easier to handle because you don't have to write any scheduling or interaction between the threads.

With an asynchronous model you only create a single thread per core. If that thread needs to sleep then it doesn't sleep but actually has to have code to handle switching to the next connection. For example, assume there are three steps in opening a connection (A,B,C):

while (!connectionsList.isEmpty()) {
  for(Connection connection : connectionsList) {

    if connection.getState() == READY_FOR_A {
      connection.stepA();
      //this method should return immediately and the connection
      //should go into the waiting state for some time before going
      //into the READY_FOR_B state
    }
    if connection.getState() == READY_FOR_B {
      connection.stepB();
      //same immediate return behavior as above
    }
    if connection.getState() == READY_FOR_C {
      connection.stepC();
      //same immediate return behavior as above
    }
    if connection.getState() == WAITING {
      //Do nothing, skip over
    }
    if connection.getState() == FINISHED {
      connectionsList.remove(connection);  
    }
  }
}

Notice that at no point does the thread sleep so there is no point in having more threads than you have cores. Ultimately, whether to go with a synchronous approach or an asynchronous approach is a matter of personal preference. Only at absolute extremes will there be performance differences between the two and you will need to spend a long time profiling to get to the point where that is the bottleneck in your application.

It sounds like you're creating a lot of threads and not getting any performance gain. There could be a number of reasons for this.

It's possible that your establishing a connection isn't actually sleeping in which case I wouldn't expect to see a performance gain past 8 threads. I don't think this is likely.
It's possible that all of the threads are using some common shared resource. In this case the other threads can't work because the sleeping thread has the shared resource. Is there any object that all of the threads share? Does this object have any synchronized methods?
It's possible that you have your own synchronization. This can create the issue mentioned above.
It's possible that each thread has to do some kind of setup/allocation work that is defeating the benefit you are gaining by using multiple threads.

If I were you I would use a tool like JVisualVM to profile your application when running with some smallish number of threads (20). JVisualVM has a nice colored thread graph which will show when threads are running, blocking, or sleeping. This will help you understand the thread/core relationship as you should see that the number of running threads is less than the number of cores you have. In addition if you see a lot of blocked threads then that can help lead you to your bottleneck (if you see a lot of blocked threads use JVisualVM to create a thread dump at that point in time and see what the threads are blocked on).

142

answered Oct 01 '22 18:10

Pace

Some concepts:

You can have many threads in the system, but only some of them (max 8 in your case) will be "scheduled" on the CPU at any point of time. So, you cannot get more performance than 8 threads running in parallel. In fact the performance will probably go down as you increase the number of threads, because of the work involved in creating, destroying and managing threads.

Threads can be in different states : http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Thread.State.html Out of those states, the RUNNABLE threads stand to get a slice of CPU time. Operating System decides assignment of CPU time to threads. In a regular system with 1000's of threads, it can be completely unpredictable when a certain thread will get CPU time and how long it will be on CPU.

About the problem you are solving:

You seem to have figured out the correct solution - making parallel asynchronous network requests. However, practically speaking starting 10000+ threads and that many network connections, at the same time, may be a strain on the system resources and it may just not work. This post has many suggestions for asynchronous I/O using Java. (Tip: Don't just look at the accepted answer)

answered Oct 04 '22 18:10

Sameer

Related questions
                            
                                Override method but parameter is arg0
                            
                                How to remove shadow from JavaFX tabs?
                            
                                What is the difference in behavior between these two usages of synchronized on a list
                            
                                Setting Multi Orientation - Android
                            
                                getBytes() With UTF-8 Doesn't Work for Upper-Case German Umlauts
                            
                                How to get validation events with JaXB?
                            
                                Java Logger - Netbeans hint "Inefficient use of string concatenation in logger"
                            
                                Spring and Spring security configuration help, cannot find a bean
                            
                                How to build a simple peer-to-peer system in Java? [closed]
                            
                                How to verify if method is called on System under test (not a mock)
                            
                                How to Create a .dst Embroidery File using Java
                            
                                Open-source Distribued Cache for Java
                            
                                Custom mail headers using MIME in Java
                            
                                How to find week of the month
                            
                                Java switch statement with += / -= operators
                            
                                Java - signal/slots mechanism
                            
                                Write a StringBuilder to a Writer, without toString()
                            
                                Itext multiple signatures
                            
                                Java: how to find the most probable string in a list of strings?
                            
                                Make maven parent project test all modules before deploying any of them

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Threads + Asynchronous

Tags:

java

asynchronous

multithreading

Altherat

People also ask

2 Answers

Pace

Sameer

Recent Activity

Donate For Us