Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-blocking I/O versus using threads (How bad is context switching?)

We use sockets a lot in a program that I work on and we handle connections from up to about 100 machines simultaneously at times. We have a combination of non-blocking I/O in use with a state table to manage it and traditional Java sockets which use threads.

We have quite a few problems with non-blocking sockets and I personally like using threads to handle sockets much better. So my question is:

How much saving is made by using non-blocking sockets on a single thread? How bad is the context switching involved in using threads and how many concurrent connections can you scale to using the threaded model in Java?

like image 381
Benj Avatar asked Nov 17 '09 13:11

Benj


People also ask

What are the differences between a blocking I/O and a non-blocking IO?

Blocking - Linear programming, easier to code, less control. Non-blocking - Parallel programming, more difficult to code, more control.

Is there context switching in threads?

1. Thread Switching : Thread switching is a type of context switching from one thread to another thread in the same process. Thread switching is very efficient and much cheaper because it involves switching out only identities and resources such as the program counter, registers and stack pointers.

Why is context switching in threads less costly than between processes?

When we switch between two threads, on the other hand, it is not needed to invalidate the TLB because all threads share the same address space, and thus have the same contents in the cache. Thus the cost of switching between threads is much smaller than the cost of switching between processes.

Is thread blocking bad?

The Main Thread is where the browser does most of the work needed to display a page. If we keep the Main Thread blocked, it can't perform its crucial tasks. This leads to slow load times, unresponsive pages, and a bad user experience.


2 Answers

I/O and non-blocking I/O selection depends from your server activity profile. E.g. if you use long-living connections and thousands of clients I/O may become too expensive because of system resources exhaustion. However, direct I/O that doesn't crowd out CPU cache is faster than non-blocking I/O. There is a good article about that - Writing Java Multithreaded Servers - whats old is new.

About context switch cost - it's rather chip operation. Consider the simple test below:

package com;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentSkipListSet;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class AAA {

    private static final long DURATION = TimeUnit.NANOSECONDS.convert(30, TimeUnit.SECONDS);
    private static final int THREADS_NUMBER = 2;
    private static final ThreadLocal<AtomicLong> COUNTER = new ThreadLocal<AtomicLong>() {
        @Override
        protected AtomicLong initialValue() {
            return new AtomicLong();
        }
    };
    private static final ThreadLocal<AtomicLong> DUMMY_DATA = new ThreadLocal<AtomicLong>() {
        @Override
        protected AtomicLong initialValue() {
            return new AtomicLong();
        }
    };
    private static final AtomicLong DUMMY_COUNTER = new AtomicLong();
    private static final AtomicLong END_TIME = new AtomicLong(System.nanoTime() + DURATION);

    private static final List<ThreadLocal<CharSequence>> DUMMY_SOURCE = new ArrayList<ThreadLocal<CharSequence>>();
    static {
        for (int i = 0; i < 40; ++i) {
            DUMMY_SOURCE.add(new ThreadLocal<CharSequence>());
        }
    }

    private static final Set<Long> COUNTERS = new ConcurrentSkipListSet<Long>();

    public static void main(String[] args) throws Exception {
        final CountDownLatch startLatch = new CountDownLatch(THREADS_NUMBER);
        final CountDownLatch endLatch = new CountDownLatch(THREADS_NUMBER);

        for (int i = 0; i < THREADS_NUMBER; i++) {
            new Thread() {
                @Override
                public void run() {
                    initDummyData();
                    startLatch.countDown();
                    try {
                        startLatch.await();
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                    while (System.nanoTime() < END_TIME.get()) {
                        doJob();
                    }
                    COUNTERS.add(COUNTER.get().get());
                    DUMMY_COUNTER.addAndGet(DUMMY_DATA.get().get());
                    endLatch.countDown();
                }
            }.start();
        }
        startLatch.await();
        END_TIME.set(System.nanoTime() + DURATION);

        endLatch.await();
        printStatistics();
    }

    private static void initDummyData() {
        for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) {
            threadLocal.set(getRandomString());
        }
    }

    private static CharSequence getRandomString() {
        StringBuilder result = new StringBuilder();
        Random random = new Random();
        for (int i = 0; i < 127; ++i) {
            result.append((char)random.nextInt(0xFF));
        }
        return result;
    }

    private static void doJob() {
        Random random = new Random();
        for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) {
            for (int i = 0; i < threadLocal.get().length(); ++i) {
                DUMMY_DATA.get().addAndGet(threadLocal.get().charAt(i) << random.nextInt(31));
            }
        }
        COUNTER.get().incrementAndGet();
    }

    private static void printStatistics() {
        long total = 0L;
        for (Long counter : COUNTERS) {
            total += counter;
        }
        System.out.printf("Total iterations number: %d, dummy data: %d, distribution:%n", total, DUMMY_COUNTER.get());
        for (Long counter : COUNTERS) {
            System.out.printf("%f%%%n", counter * 100d / total);
        }
    }
}

I made four tests for two and ten thread scenarios and it shows performance loss is about 2.5% (78626 iterations for two threads and 76754 for ten threads), System resources are used by the threads approximately equally.

Also 'java.util.concurrent' authors suppose context switch time to be about 2000-4000 CPU cycles:

public class Exchanger<V> {
   ...
   private static final int NCPU = Runtime.getRuntime().availableProcessors();
   ....
   /**
    * The number of times to spin (doing nothing except polling a
    * memory location) before blocking or giving up while waiting to
    * be fulfilled.  Should be zero on uniprocessors.  On
    * multiprocessors, this value should be large enough so that two
    * threads exchanging items as fast as possible block only when
    * one of them is stalled (due to GC or preemption), but not much
    * longer, to avoid wasting CPU resources.  Seen differently, this
    * value is a little over half the number of cycles of an average
    * context switch time on most systems.  The value here is
    * approximately the average of those across a range of tested
    * systems.
    */
   private static final int SPINS = (NCPU == 1) ? 0 : 2000; 
like image 189
denis.zhdanov Avatar answered Nov 12 '22 22:11

denis.zhdanov


For your questions the best method might be to build a test program, get some hard measurement data and make the best decision based on the data. I usually do this when trying to make such decisions, and it helps to have hard numbers to bring with you to back up your argument.

Before starting though, how many threads are you talking about? And with what type of hardware are you running your software?

like image 41
Dr. Watson Avatar answered Nov 12 '22 20:11

Dr. Watson