Suppose we have some structured data that is encapsulated in another structure so that a circular linked list can be formed.
typedef struct Data
{
int x;
int y;
} Data;
typedef struct DataNode
{
struct DataNode *next;
struct Data *data;
} DataNode;
Assuming that the circular linked list is constructed correctly and *head
points to a member of the list, are there any drawbacks (performance or otherwise) to using the ->
operator in a chain, especially in a loop?
DataNode * findPrevMatching(int x, int y)
{
// Chained arrow operators in a loop
while (!(head->next->data->x == x && head->next->data->y == y))
head = head->next;
return head;
}
Would there be any difference if I created local variables so that there are no chained arrows?
DataNode * findPrevMatching(int x, int y)
{
DataNode *next = head->next;
Data *data = next->data;
while (!(data->x == x && data->y == y))
{
// Assign head->next to head
head = head->next;
// Assign each local variable, using the new head
next = head->next;
data = next->data;
}
return head;
}
In one of my comments, I noted that I'd not done any measuring, and only measuring is likely to show anything useful. I've since created one header and 4 variant implementations of the code, as shown:
typedef struct Data
{
int x;
int y;
} Data;
typedef struct DataNode
{
struct DataNode *next;
struct Data *data;
} DataNode;
extern DataNode *head;
extern DataNode *findPrevMatching(int x, int y);
#include "node.h"
DataNode * findPrevMatching(int x, int y)
{
// Chained arrow operators in a loop
while (!(head->next->data->x == x && head->next->data->y == y))
head = head->next;
return head;
}
#include "node.h"
DataNode * findPrevMatching(int x, int y)
{
DataNode *next = head->next;
Data *data = next->data;
int thisX = data->x;
int thisY = data->y;
while (!(thisX == x && thisY == y))
{
// Assign head->next to head
head = head->next;
// Assign each local variable, using the new head
next = head->next;
data = next->data;
thisX = data->x;
thisY = data->y;
}
return head;
}
#include "node.h"
DataNode * findPrevMatching(int x, int y)
{
DataNode *next = head->next;
Data *data = next->data;
while (!(data->x == x && data->y == y))
{
head = head->next;
next = head->next;
data = next->data;
}
return head;
}
#include "node.h"
DataNode * findPrevMatching(int x, int y)
{
DataNode *next = head->next;
while (!(next->data->x == x && next->data->y == y))
{
head = head->next;
next = head->next;
}
return head;
}
The code was repeatedly compiled using this command line, but with different values in place of -O0
:
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node1.c
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node2.c
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node3.c
gcc -O0 -g -std=c11 -Wall -Wextra -Werror -c node4.c
The sizes are those reported by the size
command on Mac OS X 10.11.6 with GCC 6.2.0.
OFLAGS=-O0
__TEXT __DATA __OBJC others dec hex
176 0 0 1007 1183 49f node1.o
239 0 0 1226 1465 5b9 node2.o
208 0 0 1146 1354 54a node3.o
192 0 0 1061 1253 4e5 node4.o
OFLAGS=-O1
__TEXT __DATA __OBJC others dec hex
95 0 0 872 967 3c7 node1.o
125 0 0 1335 1460 5b4 node2.o
118 0 0 1182 1300 514 node3.o
114 0 0 993 1107 453 node4.o
OFLAGS=-O2
__TEXT __DATA __OBJC others dec hex
126 0 0 848 974 3ce node1.o
111 0 0 1410 1521 5f1 node2.o
121 0 0 1135 1256 4e8 node3.o
121 0 0 1005 1126 466 node4.o
OFLAGS=-O3
__TEXT __DATA __OBJC others dec hex
126 0 0 848 974 3ce node1.o
111 0 0 1410 1521 5f1 node2.o
128 0 0 1111 1239 4d7 node3.o
126 0 0 937 1063 427 node4.o
OFLAGS=-Os
__TEXT __DATA __OBJC others dec hex
101 0 0 848 949 3b5 node1.o
135 0 0 1293 1428 594 node2.o
112 0 0 1133 1245 4dd node3.o
107 0 0 1003 1110 456 node4.o
Clearly, no optimization (-O0
) ends up with much larger code than with any optimization. On this code, the smallest object size is from the code in node1.c
at -O1
optimization. At -O2
and -O3
, the code in node2.c
is smallest. With -Os
and -O1
, the first code is the smallest.
The clang
from XCode version 8.0 reports:
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
And the sizes reported are:
OFLAGS=-O0
__TEXT __DATA __OBJC others dec hex
189 0 0 925 1114 45a node1.o
245 0 0 1105 1350 546 node2.o
214 0 0 1023 1237 4d5 node3.o
198 0 0 970 1168 490 node4.o
OFLAGS=-O1
__TEXT __DATA __OBJC others dec hex
119 0 0 977 1096 448 node1.o
119 0 0 1071 1190 4a6 node2.o
119 0 0 1033 1152 480 node3.o
119 0 0 999 1118 45e node4.o
OFLAGS=-O2
__TEXT __DATA __OBJC others dec hex
104 0 0 973 1077 435 node1.o
104 0 0 1069 1173 495 node2.o
104 0 0 1031 1135 46f node3.o
104 0 0 997 1101 44d node4.o
OFLAGS=-O3
__TEXT __DATA __OBJC others dec hex
104 0 0 973 1077 435 node1.o
104 0 0 1069 1173 495 node2.o
104 0 0 1031 1135 46f node3.o
104 0 0 997 1101 44d node4.o
OFLAGS=-Os
__TEXT __DATA __OBJC others dec hex
104 0 0 973 1077 435 node1.o
104 0 0 1069 1173 495 node2.o
104 0 0 1031 1135 46f node3.o
104 0 0 997 1101 44d node4.o
There is absolutely no substitute for experimentation! You can look at the assembler and decide which code you think is best.
Using several field dereferencing operators (e.g. ptr->fld->otherfld->anotherfld
) is certainly possible, provided all the involved pointers are valid (otherwise it is the dreaded undefined behavior, and you practically are likely to get a segmentation fault if you are lucky, but more bad things could happen, see also this).
Performance wise, it might have a few minor issues. First, the compiler might not always be able to optimize properly by keeping some intermediate values in registers (e.g. in ptr->fld->otherfield->anotherfield->bluefield = ptr->fld->otherfield->redfield + 2;
the ptr->fld->otherfield
is likely to be kept in a register and fetched once, but compilers make no guarantee in principle). Second, you might have some CPU cache issues and cache misses. Read the answers section of http://norvig.com/21-days.html (which is also a useful thing to read in totality).
In practice, recent compilers are quite good at optimizing that, if you ask them to (e.g. compile with gcc -O2
if using GCC...), so you practically don't need to introduce your additional local variables (except for readability reasons, which is alone an excellent reason to introduce them). Also, learn more about the restrict
keyword.
But don't micro-optimize early and manually your code without benchmarking first.
If you use GCC, you could compile your code with gcc -O2 -S -fverbose-asm
and look into the produced assembler code.
Are there any drawbacks to using multiple arrow operators (
->
) chained together?
In practice, not much if you ask your compiler to optimize (of course, assuming that all involved pointers are valid). However, readability of the source code should always be a concern.
If you can create an array of nodes, make your previous and next pointers indices within the array, and allocate all your nodes from that pool, that might have performance advantages on a given architecture. It is more likely that a contiguous array will be in the cache, you can tell the OS when you will and won’t be using any node within that block with a function such as posix_madvise()
or PrefetchVirtualMemory()
, you might be able to use indices smaller than your pointers and get smaller nodes, and your CPU might support indirect addressing that makes looking up an array element as efficient as looking up a pointer.
The worst thing that can happen to correct code that dereferences several pointers to pointers in a row is a series of cache misses (or actually, page faults).
To really answer this question, though, you would want to profile, find out where the program is spending all its time, focus there, and profile again to find out how much time you saved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With