Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sedgewick Algorithms 4, why BinarySearchST put FrequencyCounters test costs lower than SequentialSearchST?

I'm reading Algorithms 4th edition. I have some questions when reading chapter 3 Searching. From the cost summary the insert cost of BinarySearchST(2N in worst case) is a little worse than SequentialSearchST(N in worst case). But the FrequencyCounter test with VisualAccumulator(which draws plots) shows

Returning to the cost of the put() operations for FrequencyCounter for words of length 8 or more, we see a reduction in the average cost from 2,246 compares (plus array accesses) per operation for SequentialSearchST to 484 for BinarySearchST.

Shouldn't the put() operations of BinarySearchST need more compares(plus array accesses) than SequentialSearchST?

Another question, for BinarySearchST, the book says

Proposition B (continued). Inserting a new key into an ordered array of size N uses ~ 2N array accesses in the worst case, so inserting N keys into an initially empty table uses ~ N2 array accesses in the worst case

When I look at the code of BinarySearchST, I think inserting a new key into an ordered array of size N uses ~ 4N array accesses.

    public void put(Key key, Value val)  {
    if (key == null) throw new IllegalArgumentException("first argument to put() is null"); 

    if (val == null) {
        delete(key);
        return;
    }

    int i = rank(key);

    // key is already in table
    if (i < n && keys[i].compareTo(key) == 0) {
        vals[i] = val;
        return;
    }

    // insert new key-value pair
    if (n == keys.length) resize(2*keys.length);

    for (int j = n; j > i; j--)  {
        keys[j] = keys[j-1];
        vals[j] = vals[j-1];
    }
    keys[i] = key;
    vals[i] = val;
    n++;

    assert check();
} 

Because for every i in the loop, there are 4 array accesses, 2 for keys reading and updating, 2 for values reading and updating. So why does prop B say it uses ~2N array accesses?

like image 389
Chen Li Avatar asked Jul 14 '17 14:07

Chen Li


1 Answers

Shouldn't the put() operations of BinarySearchST need more compares(plus array accesses) than SequentialSearchST?

The key thing to understand is where does complexity comes from for each of these two symbol table implementations. SequentialSearchST reaches its worst case when the input key is not present, because in that case it needs to perform N searches (and has N misses). Based on the type of the input text, this could happen quite often. However, even if the key is already there, on average there are N/2 compares to find it sequentially.

As per BinarySearchST, searching for the key costs logN in the worst case, so here the complexity comes from resizing the array and/or from moving the existing elements to the right to make room for a new key. Notice that when the key is missing, you should make N/2 moves on average, and when key is there, only logN compares on average. In this case the total running time highly depends on the distribution of the keys - if new keys keep coming, running time will be higher!

The test they performed included text "Tale of two cities" by Charles Dickens, taking only words with 8 letters or more. There are 14350 such words, from which 5737 distinct. After 14350 put() operations and 5737 keys in the table, you would expect about 5737 / 2 = 2868 compares to perform another put() in SequentialSearchST. However, it's better than that, you "only" need 2246 compares. BinarySearchST's runtime significantly depends on the presence of the key; the experiment showed that for this text there were far more O(logN) searches of existing keys than O(N) moves required to insert new keys, which combined gives smaller cost than SequentialSearchST. Do not mix average and worst case runtime, this analysis relies on the average case complexity for the specific example.

When I look at the code of BinarySearchST, I think inserting a new key into an ordered array of size N uses ~ 4N array accesses.

Authors should have clarified the exact definition of access. If referencing the array element means access, then there are even more, 8N array accesses because in the worst case you should first resize the whole array (take a look at the implementation of resize()). Of course, whole implementation could be rewritten to optimize number of accesses in this case by putting new key at the right place during the resize operation.

like image 57
Miljen Mikic Avatar answered Nov 06 '22 02:11

Miljen Mikic