In an ACM example, I had to build a big table for dynamic programming. I had to store two integers in each cell, so I decided to go for a std::pair<int, int>
. However, allocating a huge array of them took 1.5 seconds:
std::pair<int, int> table[1001][1001];
Afterwards, I have changed this code to
struct Cell { int first; int second; } Cell table[1001][1001];
and the allocation took 0 seconds.
What explains this huge difference in time?
Essentially std::pair is still a struct.
std::pair is a struct, the standard says the compiler determines the layout though the order must be maintained, so in the instance of std::pair<char,char> your compiler may decide to place 3-byte padding after each char for optimal alignment, so no you can't assume contiguous memory layout - end of story.
With C++ pair, use the comparison (==) operator: The comparison operator measures the first and second values of two sets, say pair1 and pair2, to see whether pair1. first is equal to pair2. first or not, and whether pair1. second is equal to pair2.
std::pair is a class template that provides a way to store two heterogeneous objects as a single unit. A pair is a specific case of a std::tuple with two elements.
std::pair<int, int>::pair()
constructor initializes the fields with default values (zero in case of int
) and your struct Cell
doesn't (since you only have an auto-generated default constructor that does nothing).
Initializing requires writing to each field which requires a whole lot of memory accesses that are relatively time consuming. With struct Cell
nothing is done instead and doing nothing is a bit faster.
The answers so far don't explain the full magnitude of the problem.
As sharptooth has pointed out, the pair solution initializes the values to zero. As Lemurik pointed out, the pair solution isn't just initializing a contiguous block of memory, instead it is calling the pair constructor for every element in the table. However, even that doesn't account for it taking 1.5 seconds. Something else is happening.
Here's my logic:
Assuming you were on an ancient machine, say running at 1.33ghz, then 1.5 seconds is 2e9 clock cycles. You've got 2e6 pairs to construct, so somehow each pair constructor is taking 1000 cycles. It doesn't take 1000 cycles to call a constructor that just sets two integers to zero. I can't see how cache misses would make it take that long. I would believe it if the number was less than 100 cycles.
I thought it would be interesting to see where else all these CPU cycles are going. I used the crappiest oldest C++ compiler I could find to see if I could attain the level of wastage required. That compiler was VC++ v6. In debug mode, it does something I don't understand. It has a big loop that calls the pair constructor for each item in the table - fair enough. That constructor sets the two values to zero - fair enough. But just before doing that, it sets all the bytes in a 68 byte region to 0xcc. That region is just before the start of the the big table. It then overwrites the last element of that region with 0x28F61200. Every call of the pair constructor repeats this. Presumably this is some kind of book keeping by the compiler so it knows which regions are initialized when checking for pointer errors at run time. I'd love to know exactly what this is for.
Anyway, that would explain where the extra time is going. Obviously another compiler may not be this bad. And certainly an optimized release build wouldn't be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With