I need to be able to search over a collection of approx 2 million items in C#. Search should be possible over multiple fields. Simple string-matching is good enough.
Using an external dependency like a database is not an option, but using an in-memory database would be OK.
The main goal is to do this memory-efficient.
The type in the collection is quite simple and has no long strings:
public class Item
{
public string Name { get; set; } // Around 50 chars
public string Category { get; set; } // Around 20 chars
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public IReadOnlyList<string> Tags { get; set; } // 2-3 items
}
Clarification of focus and requirements:
Using a simple List<T>
over above type, either as a class
or a struct
, still requires about 2 GB of memory.
Is there a better way?
The most significant memory hog in your class is the use of a read-only list. Get rid of it and you will reduce memory footprint by some 60% (tested with three tags):
public class Item
{
public string Name { get; set; }
public string Category { get; set; }
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public string Tags { get; set; } // Semi-colon separated
}
Also, consider using DateTime
instead of DateTimeOffset
. That will further reduce memory footprint with around 10%.
You can do theses dots, then you will see if there is trouble:
you can enable gcAllowVeryLargeObjects to enables arrays that are greater than 2 gigabytes.
Let the class implementation. When you choose between class and struct, the performance is not the main factor. I think there is no reason to use struct here. See Choosing Between Class and Struct.
Depending your search filter, you must override GetHashCode and Equal.
Do you need to mutate properties, or just search object in the collection?
If you just want research, and if your properties repeat themselves a lot, you can have one property used by many objects.
In this way, the value is stored only one time, and the object only store the reference.
You can do this only if you dont want to mutate the property.
As exemple, if two objects have the same category:
public class Category
{
public string Value { get; }
public Category(string category)
{
Value = category;
}
}
public class Item
{
public string Name { get; set; }
public Category Category { get; set; }
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public IReadOnlyList<string> Tags { get; set; }
}
class Program
{
public void Init()
{
Category category = new Category("categoryX");
var obj1 = new Item
{
Category = category
};
var obj2 = new Item
{
Category = category
};
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With