Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The (nearly) best way to manage a list with shifting items

Here is the situation:
I have list which store strings which are actually numbers and can become pretty big (hundreds of millions of items).
I store the numbers as string because there is a option to display some additional information which is text.

Because this takes a lot of memory to store I decided that I will store only a maximum of 5 million items. (this will only take about 250-300mb).

The list is filled by the output of a calculation. If a number is found it will be added to the list, this number is always bigger than the existing items.

When the list reached 5 mil I want to remove the first item and add the new item to the list.

like:

    // Why is this so freaking slow???
    if (_result.Count == 5000000)
        _result.RemoveAt(0);
    _result.Add(result);

As you can read in the comment, this is very, very, very slow. It just cut down my performance by 15 times. Where it took 2 minutes it now takes about 30.

I tried a few things with linq like .Skip(1).ToList but that will recreate the list and is therefore even more slow.

The list must stay in the right order, so overwriting by index is not an option (unless you could explain a nice work around).

My question:
Is there any decent way to do this?

I really need the performance here since it may need to check up about 10000000000 numbers. This may take a day ofcourse, but a month is a bit too much :(.

Need additional information, feel free to ask, I'll be happy to supply.

Solution:
This performs O(1)

    // Set the _result
    Queue<object> _result = new Queue<object>(5000000);

    /// Inside the method
    // If the count has reach it's max, dequeue the first item
    if (_result.Count == 5000000)
        _result.Dequeue();
    _result.Enqueue(result);
like image 423
Mixxiphoid Avatar asked Sep 20 '12 17:09

Mixxiphoid


2 Answers

Do you ever reorder items? If you don't, a circular queue would work quite well.

System.Collections.Generic.Queue is one, I just double checked.

To expand on the benefits of a Queue, this is the RemoveAt implementation (roughly):

for (int i = 1; i < count; i++)
    items[i-1] = items[i];
count--;

Because list[0] is always the first item, you have to move everything to remove the first item.

In contrast, a queue tracks the first item separately. This changes the above code to this:

head++
like image 115
Guvante Avatar answered Nov 13 '22 05:11

Guvante


I will suggest you to better implement a circular queue. Then you push every int to the end of the queue and when you run out of space (determined by the fixed size) then each operation will require to pop the first and push on the bottom. O(1).

Advantage vs. Array is that you'll not preallocate space until it's needed. But, last, consider REALLY to store ints as, well, ints. No matter what operations you'll perform you should always store numbers as numbers.

like image 23
Erre Efe Avatar answered Nov 13 '22 04:11

Erre Efe