Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with "very very" large arrays

Tags:

c#

.net

.net-4.5

I need to work with very large arrays of small types (int or float arrays), i'm targeting X64 only on machines with a lot of ram, physical memory is never the issue in my scenarios. While looking at the doc for gcAllowVeryLargeObjects i noticed this point :

•The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.

Now my issue is i actually "need" to work with larger arrays than that, what would be the appropriate workaround here? creating arrays of arrays or other abstractions?

Knowing i mostly need to access those arrays sequencially (never random reads, but often diferent segments getting read sequencially by diferent threads, potentially 100+ threads at once) what would my best bet be?

I may need to hold arrays of up to 65 536 000 000 elements or more.

like image 717
Ronan Thibaudau Avatar asked Sep 21 '25 06:09

Ronan Thibaudau


1 Answers

If you really must break the array length limit then you'll have to split the array into chunks of suitable size. You can wrap those chunks together in a container that has the appropriate semantics, like the BigArrayOfLong object that James McCaffrey blogged about a while back. There are numerous others like it.

The basic idea is you use a jagged array to allocate the space you're going to use. Note that a multi-dimensional array won't give you any advantage since it is still a single object, while a jagged array is a smaller array of arrays, each of which is its own object in (probably not contiguous) memory.

Here's a very simple (and not particular optimal) implementation:

public class HugeArray<T> : IEnumerable<T>
    where T : struct
{
    public static int arysize = (Int32.MaxValue >> 4) / Marshal.SizeOf<T>();

    public readonly long Capacity;
    private readonly T[][] content;

    public T this[long index]
    {
        get
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            return content[chunk][offset];
        }
        set
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            content[chunk][offset] = value;
        }
    }

    public HugeArray(long capacity)
    {
        Capacity = capacity;
        int nChunks = (int)(capacity / arysize);
        int nRemainder = (int)(capacity % arysize);

        if (nRemainder == 0)
            content = new T[nChunks][];
        else
            content = new T[nChunks + 1][];

        for (int i = 0; i < nChunks; i++)
            content[i] = new T[arysize];
        if (nRemainder > 0)
            content[content.Length - 1] = new T[nRemainder];
    }

    public IEnumerator<T> GetEnumerator()
    {
        return content.SelectMany(c => c).GetEnumerator();
    }

    IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

This one is statically allocated, but it's not too hard to make one that grows to fit demand. Just make sure that the block size you specify isn't completely out of range. I've gone with a calculation based on the item size just in case.

like image 70
Corey Avatar answered Sep 22 '25 19:09

Corey