Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logback MDC with ThreadPools or Spring Async

Tags:

java

logback

mdc

I'm trying to determine how thread-safe MDC is when using Cacheable ThreadPools or Spring's Async annotation.

I have a method that calls several CompletableFuture<> and executes them using thread pools

@Async
public CompletableFuture<List> someMethod(String request) {
    try {
        MDC.put("request", request)
        MDC.put("loggable1", "loggable1");
        MDC.put("loggable2", "loggable2");
        log.info("Log Event");
    } finally {
        MDC.clear();
    }
}

Relevant parts from Logback's MDCAdapter

final ThreadLocal<Map<String, String>> copyOnThreadLocal = new ThreadLocal<Map<String, String>>();

public void put(String key, String val) throws IllegalArgumentException {
    if (key == null) {
        throw new IllegalArgumentException("key cannot be null");
    }

    Map<String, String> oldMap = copyOnThreadLocal.get();
    Integer lastOp = getAndSetLastOperation(WRITE_OPERATION);

    if (wasLastOpReadOrNull(lastOp) || oldMap == null) {
        Map<String, String> newMap = duplicateAndInsertNewMap(oldMap);
        newMap.put(key, val);
    } else {
        oldMap.put(key, val);
    }
}

public void clear() {
    lastOperation.set(WRITE_OPERATION);
    copyOnThreadLocal.remove();
}


public void remove(String key) {
    if (key == null) {
        return;
    }
    Map<String, String> oldMap = copyOnThreadLocal.get();
    if (oldMap == null)
        return;

    Integer lastOp = getAndSetLastOperation(WRITE_OPERATION);

    if (wasLastOpReadOrNull(lastOp)) {
        Map<String, String> newMap = duplicateAndInsertNewMap(oldMap);
        newMap.remove(key);
    } else {
        oldMap.remove(key);
    }
}

Since ThreadPools reuse already spawned threads and MDC uses a ThreadLocal context map. Is it possible that we can either lose or corrupt values stored in MDC? If so what are potential scenarios that this can happen?

like image 420
shinjw Avatar asked Oct 07 '16 01:10

shinjw


1 Answers

I just wanted to say that we're seeing something that looks suspiciously like what you're describing. I don't have concrete evidence yet, but crawling through the code it looks like the time it takes to flush events out to appenders could result in the MDC being read after our threadpool clean-up routine runs, particularly since we're flushing logs to Kafka (network I/O normally being much slower than anything on-system). They use a copy-on-write inheritable thread local to store the MDC map itself, but I'm not sure how that will respond to MDC.clear() for instance. We have a lot of very short-lived tasks that execute in parallel, so with Kafka in the mix it seems likely to cause a race condition.

The flip side of this - storing a copy of the map with each entry - seems like it might not scale (might cause memory to blow up and GC to thrash). I'm currently looking for a middle ground.

like image 62
buildchimp Avatar answered Nov 02 '22 02:11

buildchimp