Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Entity Framework Core traverse big blob data without memory overflow, best practice

I'm writing code that's traversing big amounts of picture data, preparing a big delta block containing it all compressed for sending.

Here's a sample on how this data could be

[MessagePackObject]
public class Blob : VersionEntity
{
    [Key(2)]
    public Guid Id { get; set; }
    [Key(3)]
    public DateTime CreatedAt { get; set; }
    [Key(4)]
    public string Mediatype { get; set; }
    [Key(5)]
    public string Filename { get; set; }
    [Key(6)]
    public string Comment { get; set; }
    [Key(7)]
    public byte[] Data { get; set; }
    [Key(8)]
    public bool IsTemporarySmall { get; set; }
}

public class BlobDbContext : DbContext
{
    public DbSet<Blob> Blob { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blob>().HasKey(o => o.Id);
    }
}

When working with this I process everything into a filestream, and I want to keep as little as possible in the memory at any given time.

Is it enough to do it like this?

foreach(var b in context.Where(o => somefilters).AsNoTracking())
    MessagePackSerializer.Serialize(stream, b);

Will this still fill up the memory with all the blob records, or will they be processed one by one as I iterate on the enumerator. It's not using any ToList, only the enumerator, so Entity Framework should be able to process it on the go, but I'm not sure if that's what it does.

Any Entity Framework experts here who can give some guidance on how this is handled properly.

like image 339
Atle S Avatar asked Nov 13 '19 08:11

Atle S


People also ask

Is entity framework good for large database?

Entity Framework is the best way to develop database applications. I used to develop my applications using LINQ to SQL but since Microsoft is not going to support it in future, it recommends to use Entity Framework. By the way, Entity Framework 4 in . NET 4 has much better performance than previous versions.

Does Entity Framework cache data?

Entity Framework has the following forms of caching built-in: Object caching – the ObjectStateManager built into an ObjectContext instance keeps track in memory of the objects that have been retrieved using that instance. This is also known as first-level cache.

What is Cartesian explosion?

If a typical blog has multiple related posts, rows for these posts will duplicate the blog's information. This duplication leads to the so-called "cartesian explosion" problem. As more one-to-many relationships are loaded, the amount of duplicated data may grow and adversely affect the performance of your application.


1 Answers

In general when you create a LINQ filter on an Entity it is like writing a SQL statement in code form. It returns an IQueryable that has not actually executed against the database. When you iterate over the IQueryable with a foreach or call ToList() then the sql is executed and all the results are returned, and stored in memory.

https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/query-execution

While EF is maybe not the best choice for pure performance there is a relatively simple way to handle this without worrying too much about memory usage:

Consider the following

var filteredIds = BlobDbContext.Blobs
                      .Where(b => b.SomeProperty == "SomeValue")
                      .Select(x => x.Id)
                      .ToList();

Now you have filtered the Blobs according to your requirements, and executed this against the database, but only returned the Id values in memory.

Then

foreach (var id in filteredIds)
{
    var blob = BlobDbContext.Blobs.AsNoTracking().Single(x => x.Id == id);
    // Do your work here against a single in-memory blob
}

The large blob should be available for garbage collection once you are finished with it, and you shouldn't run out of memory.

Obviously you can sense-check the number of records in the id list, or you could add metadata to the first query to help you decide how to process it if you want to refine the idea.

like image 100
ste-fu Avatar answered Oct 10 '22 09:10

ste-fu