Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redundant static data

This question applies to any type of static data. I'm only using int to keep the example simple.

I am reading in a large XML data file containing ints and storing them in a vector<int>. For the particular data I'm using, it's very common for the same value to be repeated consecutively many times.

<Node value="4" count="4000">

The count attribute means that the value is to be repeated x number of times:

for(int i = 0; i < 4000; i++)
    vec.push_back(4);

It seems like a waste of memory to store the same value repeatedly when I already know that it is going to appear 4000 times in a row. However, I need to be able to index into the vector at any point.

For larger data objects, I know that I can just store a pointers but that would still involve storing 4000 identical pointers in the example above.

Is there any type of strategy to deal with an issue like this?

like image 516
user987280 Avatar asked Nov 23 '12 12:11

user987280


1 Answers

Use two vectors. The first vector contains the indices, the second one the actual values.

Fill in the indices vector such that the value for all indices between indices[i-1] and indices [i] is in values[i].

Then use binary search on the indices array to locate the position in the values array. Binary search is very efficient (O(log n)), and you will only use a fraction of the memory compared to the original approach.

If you assume the following data:

4000 ints with value "4"
followed by 200 ints with value "3"
followed by 5000 ints with value "10"

You would create an index vector and value vector and fill it like this:

indices = {4000, 4200, 9200}; // indices[i+1] = indices [i] + new_count or 0
values = {4,3,10};

As suggested in the other answers, you should probably wrap this in an operator[].

like image 158
Wilbert Avatar answered Oct 04 '22 00:10

Wilbert