Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collection with very fast iterating and good addition and remove speeds

I'm after a collection that I can iterate through very fast. I'll also be adding items and removing (specific) items fairly regularly and so ideally would like those operations to be fast too.

I'm developing on the xbox and so am restricted to the compact framework (more or less). It's very important that I keep garbage and object allocations to a minimum, so anything where i can pre-allocate space for my objects would be great.

I'll be storing uints (but can be ints if needed) in the collection. A generic solution would be good though, as i'm sure i'll have a need in the future.

A .net collection would be ideal, failing that something light-weight and open source would be great.

Is there a collection class that would suit my needs? If not, how would I go about creating one?


To elaborate a bit, they are object id's that a class should process each frame. They'll generally be added in ascending order, with gaps. There's no upper limit. Any could be removed though, which would leave gaps.
Iteration order isn't completely important, but it would be very useful (particularly for debugging) if it was in a consistent order.

like image 248
George Duckett Avatar asked Jan 06 '12 16:01

George Duckett


People also ask

Which collection is faster in C#?

ListDictionary is faster than Hashtable for small collections (10 items or fewer). The Dictionary<TKey,TValue> generic class provides faster lookup than the SortedDictionary<TKey,TValue> generic class.

Should I use collection or list C#?

The reason "why" you "should" use a Collection<T> instead of a List<T> is because if you expose a List<T> , then anyone who gets access to your object can modify the items in the list. Whereas Collection<T> is supposed to indicate that you are making your own "Add", "Remove", etc methods.

How many types of collections are there in C#?

Collections in C# are classified into two types - Generic collections and non-generic collections.


2 Answers

I've got two suggestions to try:

Firstly, what about using Reflector or ILSpy to look inside Generic List<T>? I assume that implementation isn't in the CF or you could use it. Generic List<T> is array backed and uses a doubling algorithm starting at length 4 array, every call to .Add over the Capacity causes it to double and perform an Array.Copy into the new array. It doesn't resize unless you explicitly set Capacity to zero so beware from a memory point of view. Adds are one thing but each Remove will cause the array to be copied and left-shifted after the index that was removed.

The second suggestion is this - what about about creating a custom class which wraps an integer array to handle this which uses a similar double algorithm (or quadrupling) to Generic List<T> (to handle resizing) but starts at say size 256. You could add object integer id's to this out of order as fast as you like and it won't double too often. You could also simulate a remove by designating (int)-1 or uint.MaxValue as a "null index" meaning no Array.Copy on remove. Then, apply some sort of fast sorting per frame to sort the object index array before drawing. If you sort ascending all your -1s will appear at the start (or uint.MaxValues at end) and can be ignored. You can periodically "collect" the index array by resizing and removing the preceeding -1's on a separate thread (beware - use locking ;)

What do you think? Just thinking you will offset some computation once per frame for fast sorting (not expensive on xbox vs. memory allocation/collection on each Remove and some Adds (expensive).

UPDATE - BlockArray - List<T[]> where T is fixed size array

A further comment on this. I was recently experimenting with the most efficient data-structure for dynamically sized lists and discovered by experimentation that array blocks - a class which is backed by a List of T[], where each T[] was an array of fixed size block, e.g. 512, 4096, 8192 as several advantages over a plain List<T>.

I found that an implementation of Add() (where size is unknown) in a List<T[]> vastly outperformed Add() for List<T>, especially when the total size became larger. This is due to List<T>'s doubling algorithm which requests a new array 2x the size of the old, and memcpy's the old array over whenever size is exceeded.

Iterating speed is similar. Pre-allocation (pre-defined capacity) was much slower than List<T> for small block sizes (512), but only slightly slower for large block sizes (8192). Removes are problematic, as require copying/left shifting many blocks.

What's interesting is in a List<T[]> (block list), you can cache/pool the blocks. If small enough, blocks of T[] fit into the small object heap (favouring compaction, quicker allocation), may fit into the L2 cache and a number of blocks could be pre-allocated and pooled

like image 75
Dr. Andrew Burnett-Thompson Avatar answered Sep 30 '22 08:09

Dr. Andrew Burnett-Thompson


How about using Mono's HashSet<T>?

https://github.com/mono/mono/blob/master/mcs/class/System.Core/System.Collections.Generic/HashSet.cs

It's licensed under MIT X11, which is a permissive license.


Iteration order might be a problem depending on how T implements GetHashCode, but that shouldn't be a problem when using uint. The default implementation of GetHashCode on the other hand returns different values on different instances, which might lead to different iteration order.

Iteration order might also depend on the order items are added. i.e. two collections containing the same items might iterate in a different order if the items were added in a different order.

There is also a SortedSet<T> collection, but I don't know what performance characteristics it has.

like image 44
CodesInChaos Avatar answered Sep 30 '22 07:09

CodesInChaos