This question applies to any type of static data. I'm only using int
to keep the example simple.
I am reading in a large XML data file containing ints and storing them in a vector<int>
. For the particular data I'm using, it's very common for the same value to be repeated consecutively many times.
<Node value="4" count="4000">
The count
attribute means that the value is to be repeated x number of times:
for(int i = 0; i < 4000; i++)
vec.push_back(4);
It seems like a waste of memory to store the same value repeatedly when I already know that it is going to appear 4000 times in a row. However, I need to be able to index into the vector at any point.
For larger data objects, I know that I can just store a pointers but that would still involve storing 4000 identical pointers in the example above.
Is there any type of strategy to deal with an issue like this?
Use two vectors. The first vector contains the indices, the second one the actual values.
Fill in the indices vector such that the value for all indices between indices[i-1] and indices [i] is in values[i].
Then use binary search on the indices array to locate the position in the values array. Binary search is very efficient (O(log n)), and you will only use a fraction of the memory compared to the original approach.
If you assume the following data:
4000 ints with value "4"
followed by 200 ints with value "3"
followed by 5000 ints with value "10"
You would create an index vector and value vector and fill it like this:
indices = {4000, 4200, 9200}; // indices[i+1] = indices [i] + new_count or 0
values = {4,3,10};
As suggested in the other answers, you should probably wrap this in an operator[].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With