Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Heap solution outperforming stack?

Tags:

c++

c

I am coding a C simulation, in which, given a sequence of rules to verify, we break it up into 'slices' and verify each slice. (The basic idea is that the order is important, and the actual meaning of a rule is affected by some rules above it; we can make a 'slice' with each rule and only those rules above it which overlap it. We then verify the slices, which are usually much smaller than the whole sequence was.)

My problem is as follows.

I have a struct (policy) which contains an array of structs (rules), and an int (length). My original implementation used malloc and realloc liberally:

struct{
  struct rule *rules;
  int length;
}policy;
...
struct policy makePolicy(int length)
{
  struct policy newPolicy;
  newPolicy.rules = malloc(length * sizeof(struct rule));
  newPolicy.length = length;
  return newPolicy;
}
...
struct policy makeSlice(struct policy inPol, int rulePos)
{
  if(rulePos > inPol.length - 1){
    printf("Slice base outside policy \n");
    exit(1);
   }
  struct slice = makePolicy(inPol.length);
  //create slice, loop counter gets stored in sliceLength
  slice.rules = realloc(slice.rules, sliceLength * sizeof(struct rule));
  slice.length = sliceLength;
  return slice;
}

As this uses malloc'ed memory, I'm assuming it makes heavy use of heap. Now I'm trying to port to an experimental parallel machine, which has no malloc.

I sadly went and allocated everything with fixed size arrays.

Now here's the shocker.

The new code runs slower. Much slower.

(The original code used to wait for minutes on end when the slice length was say 200, and maybe an hour at over 300 ... now it does that when the slice length is 70, 80 ... and has been taking hours for say 120. Still not 200.)

The only thing is that now the slices are given the same memory as a full policy (MAXBUFLEN is 10000), but the whole doesn't seem to be running out of memory at all. 'top' shows that the total memory consumed is quite modest, tens-of-megabytes range, as before. (And of course as I'm storing the length, I'm not looping over the whole thing, just the part with real rules.)

Could anyone please help explain why it suddenly got so much slower?

like image 870
user2431187 Avatar asked May 29 '13 06:05

user2431187


1 Answers

It seems that when you fixed the size of the struct to a larger size (say 10000 rules), your cache locality could become much worse than the original one. You can use a profiler (oprofile or cachegrind in Valgrind) to see if cache is a problem.

In the original program, one cache line can hold at most 8 struct policy (on a 32bit machine with 64byte cache line). But in the modified verison it can only hold one since it is now much larger than the cache line size.

Move the length field up can improve performance in this case since now the length and the first few struct rules can fit into a single cache line.

struct policy{
  int length;
  struct rule rules[10000];
};

To solve this problem you need to write your own custom allocator to ensure cache locality. If you are writing a parallel version of this program, also remember to isolate memory used by different threads into different cache lines to avoid cache line contention.

like image 53
Naruil Avatar answered Oct 13 '22 00:10

Naruil