Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any drawbacks to using multiple arrow operators (->) chained together?

Suppose we have some structured data that is encapsulated in another structure so that a circular linked list can be formed.

typedef struct Data 
{
    int x;
    int y;
} Data;

typedef struct DataNode 
{
    struct DataNode *next;
    struct Data *data;
} DataNode;

Assuming that the circular linked list is constructed correctly and *head points to a member of the list, are there any drawbacks (performance or otherwise) to using the -> operator in a chain, especially in a loop?

DataNode * findPrevMatching(int x, int y)
{
    // Chained arrow operators in a loop
    while (!(head->next->data->x == x && head->next->data->y == y))  
        head = head->next;

    return head;
}

Would there be any difference if I created local variables so that there are no chained arrows?

DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;
    Data *data = next->data;

    while (!(data->x == x && data->y == y))
    {
        // Assign head->next to head
        head = head->next;

        // Assign each local variable, using the new head
        next = head->next;
        data = next->data;
    }

    return head;
}
like image 578
hololeap Avatar asked Sep 19 '16 02:09

hololeap


3 Answers

In one of my comments, I noted that I'd not done any measuring, and only measuring is likely to show anything useful. I've since created one header and 4 variant implementations of the code, as shown:

node.h

typedef struct Data 
{
    int x;
    int y;
} Data;

typedef struct DataNode 
{
    struct DataNode *next;
    struct Data *data;
} DataNode;

extern DataNode *head;

extern DataNode *findPrevMatching(int x, int y);

node1.c

#include "node.h"

DataNode * findPrevMatching(int x, int y)
{
    // Chained arrow operators in a loop
    while (!(head->next->data->x == x && head->next->data->y == y))  
        head = head->next;

    return head;
}

node2.c

#include "node.h"


DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;
    Data *data = next->data;
    int thisX = data->x;
    int thisY = data->y;

    while (!(thisX == x && thisY == y))
    {
        // Assign head->next to head
        head = head->next;

        // Assign each local variable, using the new head
        next = head->next;
        data = next->data;
        thisX = data->x;
        thisY = data->y;
    }

    return head;
}

node3.c

#include "node.h"


DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;
    Data *data = next->data;

    while (!(data->x == x && data->y == y))
    {
        head = head->next;
        next = head->next;
        data = next->data;
    }

    return head;
}

node4.c

#include "node.h"


DataNode * findPrevMatching(int x, int y)
{   
    DataNode *next = head->next;

    while (!(next->data->x == x && next->data->y == y))
    {
        head = head->next;
        next = head->next;
    }

    return head;
}

Code sizes under different optimization regimes

The code was repeatedly compiled using this command line, but with different values in place of -O0:

gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node1.c
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node2.c
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node3.c
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node4.c

GCC 6.2.0 on Mac OS X 10.11.6

The sizes are those reported by the size command on Mac OS X 10.11.6 with GCC 6.2.0.

OFLAGS=-O0
__TEXT  __DATA  __OBJC  others  dec     hex
176     0       0       1007    1183    49f     node1.o
239     0       0       1226    1465    5b9     node2.o
208     0       0       1146    1354    54a     node3.o
192     0       0       1061    1253    4e5     node4.o

OFLAGS=-O1
__TEXT  __DATA  __OBJC  others  dec     hex
95      0       0       872     967     3c7     node1.o
125     0       0       1335    1460    5b4     node2.o
118     0       0       1182    1300    514     node3.o
114     0       0       993     1107    453     node4.o

OFLAGS=-O2
__TEXT  __DATA  __OBJC  others  dec     hex
126     0       0       848     974     3ce     node1.o
111     0       0       1410    1521    5f1     node2.o
121     0       0       1135    1256    4e8     node3.o
121     0       0       1005    1126    466     node4.o

OFLAGS=-O3
__TEXT  __DATA  __OBJC  others  dec     hex
126     0       0       848     974     3ce     node1.o
111     0       0       1410    1521    5f1     node2.o
128     0       0       1111    1239    4d7     node3.o
126     0       0       937     1063    427     node4.o

OFLAGS=-Os
__TEXT  __DATA  __OBJC  others  dec     hex
101     0       0       848     949     3b5     node1.o
135     0       0       1293    1428    594     node2.o
112     0       0       1133    1245    4dd     node3.o
107     0       0       1003    1110    456     node4.o

Clearly, no optimization (-O0) ends up with much larger code than with any optimization. On this code, the smallest object size is from the code in node1.c at -O1 optimization. At -O2 and -O3, the code in node2.c is smallest. With -Os and -O1, the first code is the smallest.

Clang from XCode 8.0

The clang from XCode version 8.0 reports:

Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0

And the sizes reported are:

OFLAGS=-O0
__TEXT  __DATA  __OBJC  others  dec     hex
189     0       0       925     1114    45a     node1.o
245     0       0       1105    1350    546     node2.o
214     0       0       1023    1237    4d5     node3.o
198     0       0       970     1168    490     node4.o

OFLAGS=-O1
__TEXT  __DATA  __OBJC  others  dec     hex
119     0       0       977     1096    448     node1.o
119     0       0       1071    1190    4a6     node2.o
119     0       0       1033    1152    480     node3.o
119     0       0       999     1118    45e     node4.o

OFLAGS=-O2
__TEXT  __DATA  __OBJC  others  dec     hex
104     0       0       973     1077    435     node1.o
104     0       0       1069    1173    495     node2.o
104     0       0       1031    1135    46f     node3.o
104     0       0       997     1101    44d     node4.o

OFLAGS=-O3
__TEXT  __DATA  __OBJC  others  dec     hex
104     0       0       973     1077    435     node1.o
104     0       0       1069    1173    495     node2.o
104     0       0       1031    1135    46f     node3.o
104     0       0       997     1101    44d     node4.o

OFLAGS=-Os
__TEXT  __DATA  __OBJC  others  dec     hex
104     0       0       973     1077    435     node1.o
104     0       0       1069    1173    495     node2.o
104     0       0       1031    1135    46f     node3.o
104     0       0       997     1101    44d     node4.o

Conclusion

There is absolutely no substitute for experimentation! You can look at the assembler and decide which code you think is best.

like image 98
Jonathan Leffler Avatar answered Oct 13 '22 00:10

Jonathan Leffler


Using several field dereferencing operators (e.g. ptr->fld->otherfld->anotherfld) is certainly possible, provided all the involved pointers are valid (otherwise it is the dreaded undefined behavior, and you practically are likely to get a segmentation fault if you are lucky, but more bad things could happen, see also this).

Performance wise, it might have a few minor issues. First, the compiler might not always be able to optimize properly by keeping some intermediate values in registers (e.g. in ptr->fld->otherfield->anotherfield->bluefield = ptr->fld->otherfield->redfield + 2; the ptr->fld->otherfield is likely to be kept in a register and fetched once, but compilers make no guarantee in principle). Second, you might have some CPU cache issues and cache misses. Read the answers section of http://norvig.com/21-days.html (which is also a useful thing to read in totality).

In practice, recent compilers are quite good at optimizing that, if you ask them to (e.g. compile with gcc -O2 if using GCC...), so you practically don't need to introduce your additional local variables (except for readability reasons, which is alone an excellent reason to introduce them). Also, learn more about the restrict keyword.

But don't micro-optimize early and manually your code without benchmarking first.

If you use GCC, you could compile your code with gcc -O2 -S -fverbose-asm and look into the produced assembler code.

Are there any drawbacks to using multiple arrow operators (->) chained together?

In practice, not much if you ask your compiler to optimize (of course, assuming that all involved pointers are valid). However, readability of the source code should always be a concern.

like image 36
Basile Starynkevitch Avatar answered Oct 12 '22 22:10

Basile Starynkevitch


If you can create an array of nodes, make your previous and next pointers indices within the array, and allocate all your nodes from that pool, that might have performance advantages on a given architecture. It is more likely that a contiguous array will be in the cache, you can tell the OS when you will and won’t be using any node within that block with a function such as posix_madvise() or PrefetchVirtualMemory(), you might be able to use indices smaller than your pointers and get smaller nodes, and your CPU might support indirect addressing that makes looking up an array element as efficient as looking up a pointer.

The worst thing that can happen to correct code that dereferences several pointers to pointers in a row is a series of cache misses (or actually, page faults).

To really answer this question, though, you would want to profile, find out where the program is spending all its time, focus there, and profile again to find out how much time you saved.

like image 34
Davislor Avatar answered Oct 13 '22 00:10

Davislor