Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C#: Memory-efficient search through 2 million objects without external dependencies

I need to be able to search over a collection of approx 2 million items in C#. Search should be possible over multiple fields. Simple string-matching is good enough.

Using an external dependency like a database is not an option, but using an in-memory database would be OK.

The main goal is to do this memory-efficient.

The type in the collection is quite simple and has no long strings:

public class Item
{
    public string Name { get; set; } // Around 50 chars
    public string Category { get; set; } // Around 20 chars
    public bool IsActive { get; set; }
    public DateTimeOffset CreatedAt { get; set; }
    public IReadOnlyList<string> Tags { get; set; } // 2-3 items
}

Focus and requirements

Clarification of focus and requirements:

  • No external dependencies (like a database)
  • Memory-efficient (below 2 GB for 2 million items)
  • Searchable items in collection (must be performant)

Today's non-optimal solution

Using a simple List<T> over above type, either as a class or a struct, still requires about 2 GB of memory.

Is there a better way?

like image 533
Seb Nilsson Avatar asked Jun 11 '20 13:06

Seb Nilsson


2 Answers

The most significant memory hog in your class is the use of a read-only list. Get rid of it and you will reduce memory footprint by some 60% (tested with three tags):

public class Item
{
    public string Name { get; set; }
    public string Category { get; set; }
    public bool IsActive { get; set; }
    public DateTimeOffset CreatedAt { get; set; }
    public string Tags { get; set; } // Semi-colon separated
}

Also, consider using DateTime instead of DateTimeOffset. That will further reduce memory footprint with around 10%.

like image 137
l33t Avatar answered Oct 08 '22 21:10

l33t


You can do theses dots, then you will see if there is trouble:

  • you can enable gcAllowVeryLargeObjects to enables arrays that are greater than 2 gigabytes.

  • Let the class implementation. When you choose between class and struct, the performance is not the main factor. I think there is no reason to use struct here. See Choosing Between Class and Struct.

  • Depending your search filter, you must override GetHashCode and Equal.


Do you need to mutate properties, or just search object in the collection?

If you just want research, and if your properties repeat themselves a lot, you can have one property used by many objects.

In this way, the value is stored only one time, and the object only store the reference.

You can do this only if you dont want to mutate the property.

As exemple, if two objects have the same category:

public class Category
{
    public string Value { get; }

    public Category(string category)
    {
        Value = category;
    }
}

public class Item
{
    public string Name { get; set; }
    public Category Category { get; set; }
    public bool IsActive { get; set; }
    public DateTimeOffset CreatedAt { get; set; }
    public IReadOnlyList<string> Tags { get; set; }
}


class Program
{
    public void Init()
    {
        Category category = new Category("categoryX");

        var obj1 = new Item
        {
            Category = category
        };

        var obj2 = new Item
        {
            Category = category
        };
    }
}
like image 43
Thibaut Avatar answered Oct 08 '22 21:10

Thibaut