Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Traversing a Binary Tree with multiple threads

So I'm working on a speed contest in Java. I have (number of processors) threads doing work, and they all need to add to a binary tree. Originally I just used a synchronized add method, but I wanted to make it so threads could follow each other through the tree (each thread only has the lock on the object it's accessing). Unfortunately, even for a very large file (48,000 lines), my new binary tree is slower than the old one. I assume this is because I'm getting and releasing a lock every time I move in the tree. Is this the best way to do this or is there a better way?

Each node has a ReentrantLock named lock, and getLock() and releaseLock() just call lock.lock() and lock.unlock();

My code:

public void add(String sortedWord, String word) {

    synchronized(this){
        if (head == null) {
            head = new TreeNode(sortedWord, word);
            return;
        }
        head.getLock();
    }

    TreeNode current = head, previous = null;
    while (current != null) {

        // If this is an anagram of another word in the list..
        if (current.getSortedWord().equals(sortedWord)) {
            current.add(word);
            current.releaseLock();
            return;
        }
        // New word is less than current word
        else if (current.compareTo(sortedWord) > 0) {
            previous = current;
            current = current.getLeft();
            if(current != null){
                current.getLock();
                previous.releaseLock();
            }
        }
        // New word greater than current word
        else {
            previous = current;
            current = current.getRight();
            if(current != null){
                current.getLock();
                previous.releaseLock();
            }
        }
    }

    if (previous.compareTo(sortedWord) > 0) {
        previous.setLeft(sortedWord, word);
    }
    else {
        previous.setRight(sortedWord, word);
    }
    previous.releaseLock();
}

EDIT: Just to clarify, my code is structured like this: The main thread reads input from a file and adds the words to a queue, each worker thread pull words from the queue and does some work (including sorting them and adding them to the binary tree).

like image 240
Brendan Long Avatar asked Apr 13 '26 03:04

Brendan Long


2 Answers

Another thing. There definitely is no place for a binary tree in performance critical code. The cacheing behaviour will kill all performance. It should have a much larger fan out (one cache line) [edit] With a binary tree you access too much non-contiguous memory. Take a look at the material on Judy trees.

And you probably want to start with a radix of at least one character before starting the tree.

And do the compare on an int key instead of a string first.

And perhaps look at tries

And getting rid of all the threads and synchronization. Just try to make the problem memory access bound

[edit] I would do this a bit different. I would use a thread for each first character of the string, and give them their own BTree (or perhaps a Trie). I'd put a non-blocking work queue in each thread and fill them based on the first character of the string. You can get even more performance by presorting the add queue and doing a merge sort into the BTree. In the BTree, I'd use int keys representing the first 4 characters, only refering to the strings in the leaf pages.

In a speed contest, you hope to be memory access bound, and therefore have no use for threads. If not, you're still doing too much processing per string.

like image 157
Stephan Eggermont Avatar answered Apr 14 '26 16:04

Stephan Eggermont


I would actually start looking at the use of compare() and equals() and see if something can be improved there. You might wrap you String object in another class with an different, optimized for your usecase, compare() method. For instance, consider using hashCode() instead of equals(). The hashcode is cached so future calls will be that much faster. Consider interning the strings. I don't know if the vm will accept that many strings but it's worth checking out.

(this was going to be a comment to an answer but got too wordy).

When reading the nodes you need to get a read-lock for each node as you reach it. If you read-lock the whole tree then you gain nothing. Once you reach the node you want to modify, you release the read lock for that node and try to acquire the write lock. Code would be something like:

TreeNode current; // add a ReentrantReadWriteLock to each node.

// enter the current node:
current.getLock().readLock().lock();
if (isTheRightPlace(current) {
current.getLock().readLock().unlock();
current.getLock().writeLock().lock(); // NB: getLock returns a ConcurrentRWLock
// do stuff then release lock
current.getLock().writeLock().unlock();
} else {
current.getLock().readLock().unlock();
}

like image 44
Erik Avatar answered Apr 14 '26 17:04

Erik