Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with very large Lists on x86

Tags:

c#

.net

winforms

I need to work with large lists of floats, but I am hitting memory limits on x86 systems. I do not know the final length, so I need to use an expandable type. On x64 systems, I can use <gcAllowVeryLargeObjects>.

My current data type:

List<RawData> param1 = new List<RawData>();
List<RawData> param2 = new List<RawData>();
List<RawData> param3 = new List<RawData>();

public class RawData
{
    public string name;
    public List<float> data;
}

The length of the paramN lists is low (currently 50 or lower), but data can be 10m+. When the length is 50, I am hitting memory limits (OutOfMemoryException) at just above 1m data points, and when the length is 25, I hit the limit at just above 2m data points. (If my calculations are right, that is exactly 200MB, plus the size of name, plus overhead). What can I use to increase this limit?

Edit: I tried using List<List<float>> with a max inner list size of 1 << 17 (131072), which increased the limit somewhat, but still not as far as I want.

Edit2: I tried reducing the chunk size in the List> to 8192, and I got OOM at ~2.3m elements, with task manager reading ~1.4GB for the process. It looks like I need to reduce memory usage in between the data source and the storage, or trigger GC more often - I was able to gather 10m data points in a x64 process on a pc with 4GB RAM, IIRC the process never went over 3GB

Edit3: I condensed my code down to just the parts that handle the data. http://pastebin.com/maYckk84

Edit4: I had a look in DotMemory, and found that my data structure does take up ~1GB with the settings I was testing on (50ch * 3 params * 2m events = 300,000,000 float elements). I guess I will need to limit it on x86 or figure out how to write to disk in this format as I get data

like image 314
Malik Drako Avatar asked Jul 18 '15 00:07

Malik Drako


1 Answers

First of all, on x86 systems memory limit is 2GB, not 200MB. I presume your problem is much more trickier than that. You have aggressive LOH (large object heap) fragmentation.
CLR uses different heaps for small and large objects. Object is large if its size is larger than 85,000 bytes. LOH is a very fractious thing, it is not eager to return unused memory back to OS, and it is very poor at defragmentation.
.Net List is implementation of ArrayList data structure, it stores elements in array, which has fixed size; when array is filled, new array with doubled size is created. That continuous growth of array with your amount of data is a "starvation" scenario for LOH.
So, you have to use tailor-made data structure to suit your needs. E.g. list of chunks, with each chunk is small enough not to get into LOH. Here is small prototype:

public class ChunkedList
{
    private readonly List<float[]> _chunks = new List<float[]>();
    private const int ChunkSize = 8000;
    private int _count = 0;       

    public void Add(float item)
    {            
        int chunk = _count / ChunkSize;
        int ind = _count % ChunkSize;
        if (ind == 0)
        {
            _chunks.Add(new float[ChunkSize]);
        }
        _chunks[chunk][ind] = item;
        _count ++;
    }

    public float this[int index]
    {
        get
        {
            if(index <0 || index >= _count) throw new IndexOutOfRangeException();
            int chunk = index / ChunkSize;
            int ind = index % ChunkSize;
            return _chunks[chunk][ind];
        }
        set
        {
            if(index <0 || index >= _count) throw new IndexOutOfRangeException();
            int chunk = index / ChunkSize;
            int ind = index % ChunkSize;
            _chunks[chunk][ind] = value;
        }
    }
    //other code you require
}

With ChunkSize = 8000 every chunk will take only 32,000 bytes, so it will not get into LOH. _chunks will get into LOH only when there will be about 16,000 chunks in collection, which is more than 128 million elements in collection (about 500 MB).

UPD I've performed some stress tests for sample above. OS is x64, solution platform is x86. ChunkSize is 20000.

First:

var list = new ChunkedList();
for (int i = 0; ; i++)
{
    list.Add(0.1f);
}

OutOfMemoryException is raised at ~324,000,000 elements

Second:

public class RawData
{
    public string Name;
    public ChunkedList Data = new ChunkedList();
}

var list = new List<RawData>();
for (int i = 0;; i++)
{
    var raw = new RawData { Name = "Test" + i };
    for (int j = 0; j < 20 * 1000 * 1000; j++)
    {
        raw.Data.Add(0.1f);
    }
    list.Add(raw);
}

OutOfMemoryException is raised at i=17, j~12,000,000. 17 RawData instances successfully created, 20 million data points per each, about 352 million data points totally.

like image 104
Aloraman Avatar answered Sep 23 '22 11:09

Aloraman