Here is the situation:
I have list which store strings which are actually numbers and can become pretty big (hundreds of millions of items).
I store the numbers as string because there is a option to display some additional information which is text.
Because this takes a lot of memory to store I decided that I will store only a maximum of 5 million items. (this will only take about 250-300mb).
The list is filled by the output of a calculation. If a number is found it will be added to the list, this number is always bigger than the existing items.
When the list reached 5 mil I want to remove the first item and add the new item to the list.
like:
// Why is this so freaking slow???
if (_result.Count == 5000000)
_result.RemoveAt(0);
_result.Add(result);
As you can read in the comment, this is very, very, very slow. It just cut down my performance by 15 times. Where it took 2 minutes it now takes about 30.
I tried a few things with linq like .Skip(1).ToList
but that will recreate the list and is therefore even more slow.
The list must stay in the right order, so overwriting by index is not an option (unless you could explain a nice work around).
My question:
Is there any decent way to do this?
I really need the performance here since it may need to check up about 10000000000 numbers. This may take a day ofcourse, but a month is a bit too much :(.
Need additional information, feel free to ask, I'll be happy to supply.
Solution:
This performs O(1)
// Set the _result
Queue<object> _result = new Queue<object>(5000000);
/// Inside the method
// If the count has reach it's max, dequeue the first item
if (_result.Count == 5000000)
_result.Dequeue();
_result.Enqueue(result);
Do you ever reorder items? If you don't, a circular queue would work quite well.
System.Collections.Generic.Queue is one, I just double checked.
To expand on the benefits of a Queue, this is the RemoveAt
implementation (roughly):
for (int i = 1; i < count; i++)
items[i-1] = items[i];
count--;
Because list[0]
is always the first item, you have to move everything to remove the first item.
In contrast, a queue tracks the first item separately. This changes the above code to this:
head++
I will suggest you to better implement a circular queue. Then you push every int to the end of the queue and when you run out of space (determined by the fixed size) then each operation will require to pop the first and push on the bottom. O(1)
.
Advantage vs. Array is that you'll not preallocate space until it's needed. But, last, consider REALLY to store ints as, well, ints. No matter what operations you'll perform you should always store numbers as numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With