Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dereferencing a char pointer gets slower as the string it points to lengthens. Why?

I've come across a strange problem: I have the following code:

int matches = 0;
for (int str_id = 0; str_id < STR_COUNT; str_id++) {
    if (test(strings1[str_id], strings2[str_id]) == 0)
        matches++;
}

It compares pairs of null-terminated strings using the test() function. strings1 and strings2 are vectors containing STR_COUNT null-terminated strings of the same length.

Depending on whether test() dereferences its arguments, this snippet executes in constant or in linear time with regards to the length of the strings in strings1 and strings2. That is, if i use:

int test(char* a, char* b) {
    return (a != b)
}

then the running time doesn't depend on the length of the strings stored in strings1 and strings2. On the other hand, if I use

int test(char* a, char* b) {
    return (*a != *b)
}

then the running time increases linearly with the length of the strings stored in strings1 and strings2.

Why could this happen?

EDIT: Full example of the problem here: http://pastebin.com/QTPAkP1g

like image 546
Clément Avatar asked Sep 29 '12 18:09

Clément


2 Answers

You are seeing the effects of data locality.

In the case where you are merely comparing pointers, the operation accesses only the memory in the two vectors. Vectors store their elements contiguously, so each memory access is to a location very close to the one accessed during the previous iteration. This is very good locality, and the cache smiles upon you.

In the case where you dereference the pointers, you are adding additional memory accesses to the mix, so the cache has more work to do, and the effect depends a lot on the implementation.

Extrapolating from your data, it appears that the strings are packed together in memory, so the distance from the start of one string to the start of the next string depends on the length of the string. Short strings are packed more tightly together than longer strings.

In particular, you may be able to pack a few very short strings into a single cache line. When that happens, a single cache line can service the memory accesses of multiple iterations. As the strings get longer, fewer of them will fit into a single cache line, so the cache efficiency decreases. Eventually, the strings are long enough that each string occupies a separate cache line, and the cache provides no benefit.

like image 52
Raymond Chen Avatar answered Nov 01 '22 03:11

Raymond Chen


Because in the first case it can be proved that as long as strings1 != strings2, the condition will never be true. An optimizing compiler can just deduce that the whole loop will never have any observable side effect, so it can optimize it to oblivion.

Consider that strings[str_id] is equal to strings + str_id * sizeof(*strings); let's assume that the sizeof is equal to 1 for simplicity (we can do this without loss of generality). Your condition then becomes:

if (test(strings1 + str_id, strings2 + str_id) == 0)

If the compiler is able to inline test, with the first version of test the code would become

if ((strings1 + str_id != strings2 + str_id) == 0)

or (successive simplified but equivalent forms)

if (strings1 + str_id == strings2 + str_id)

if (strings1 == strings2)

So, since strings1 != strings2 (this is almost certainly the case) and since the compiler can assume that strings1 and strings2 will not be modified by external causes it can simply skip the whole loop and just do nothing instead. Doing nothing is constant time.

With the second version of test there is no way to know if the condition is true other than actually executing the loop and dereferencing the pointers on each iteration, so the running time becomes linear.

like image 2
Jon Avatar answered Nov 01 '22 02:11

Jon