Im trying to request a big load of data and then parse it into a report. The problem is that the data I'm requesting has 27 million lines of records, each having 6 joins, which when loaded via Entity framework uses all of the server RAM. Ive implemented a pagination system to buffer the processing into smaller chunks like you would do with an IO operation.
I request 10,000 records, writing them to a file stream (to disk) and them I'm trying to clear the 10,000 records from memory as they're no longer needed.
I'm having trouble garbage collecting the database context. I've tried disposing the object, nulling the reference and then creating a new context on the next batch of 10,000 records. This does not seem to work. (this was recommended by one of the devs on ef core: https://github.com/aspnet/EntityFramework/issues/5473)
The only other alternative I see is to use a raw SQL query to achieve what I want. I'm trying to build the system to deal with any request size and the only variable factor will be the time it takes to produce the reports. Is there something I can do with the EF context to get rid of loaded entities?
private void ProcessReport(ZipArchive zip, int page, int pageSize)
{
using (var context = new DBContext(_contextOptions))
{
var batch = GetDataFromIndex(page, pageSize, context).ToArray();
if (!batch.Any())
{
return;
}
var file = zip.CreateEntry("file_" + page + ".csv");
using (var entryStream = file.Open())
using (var streamWriter = new StreamWriter(entryStream))
{
foreach (var reading in batch)
{
try
{
streamWriter.WriteLine("write data from record here.")
}
catch (Exception e)
{
//handle error
}
}
}
batch = null;
}
ProcessReport(zip, page + 1, pageSize);
}
private IEnumerable<Reading> GetDataFromIndex(int page, int pageSize, DBContext context)
{
var batches = (from rb in context.Reading.AsNoTracking()
//Some joins
select rb)
.Skip((page - 1) * pageSize)
.Take(pageSize);
return batches
.Includes(x => x.Something)
}
Apart from your memory management issue, you are going to have a bad time using paging for this. Running the paging queries is going to get expensive on the server. You don't need to page. Just iterate the query results (ie don't call ToList(), or ToArray()).
Also when paging you must add ordering to the queries, or else SQL may return overlapping rows, or have gaps. See for SQL Server, eg: https://docs.microsoft.com/en-us/sql/t-sql/queries/select-order-by-clause-transact-sql EF Core doesn't enforce this, as some providers might guarantee that paging queries always read rows in the same order.
Here's an example of EF Core (1.1 on .NET Core) plowing through a huge resultset without increasing memory usage:
using Microsoft.EntityFrameworkCore;
using System.Linq;
using System;
using System.ComponentModel.DataAnnotations.Schema;
namespace efCoreTest
{
[Table("SomeEntity")]
class SomeEntity
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public DateTime CreatedOn { get; set; }
public int A { get; set; }
public int B { get; set; }
public int C { get; set; }
public int D { get; set; }
virtual public Address Address { get; set; }
public int AddressId { get; set; }
}
[Table("Address")]
class Address
{
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public int Id { get; set; }
public string Line1 { get; set; }
public string Line2 { get; set; }
public string Line3 { get; set; }
}
class Db : DbContext
{
public DbSet<SomeEntity> SomeEntities { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSqlServer("Server=.;Database=efCoreTest;Integrated Security=true");
}
}
class Program
{
static void Main(string[] args)
{
using (var db = new Db())
{
db.Database.EnsureDeleted();
db.Database.EnsureCreated();
db.Database.ExecuteSqlCommand("alter database EfCoreTest set recovery simple;");
var LoadAddressesSql = @"
with N as
(
select top (10) cast(row_number() over (order by (select null)) as int) i
from sys.objects o, sys.columns c, sys.columns c2
)
insert into Address(Id, Line1, Line2, Line3)
select i Id, 'AddressLine1' Line1,'AddressLine2' Line2,'AddressLine3' Line3
from N;
";
var LoadEntitySql = @"
with N as
(
select top (1000000) cast(row_number() over (order by (select null)) as int) i
from sys.objects o, sys.columns c, sys.columns c2
)
insert into SomeEntity (Name, Description, CreatedOn, A,B,C,D, AddressId)
select concat('EntityName',i) Name,
concat('Entity Description which is really rather long for Entity whose ID happens to be ',i) Description,
getdate() CreatedOn,
i A, i B, i C, i D, 1+i%10 AddressId
from N
";
Console.WriteLine("Generating Data ...");
db.Database.ExecuteSqlCommand(LoadAddressesSql);
Console.WriteLine("Loaded Addresses");
for (int i = 0; i < 10; i++)
{
var rows = db.Database.ExecuteSqlCommand(LoadEntitySql);
Console.WriteLine($"Loaded Entity Batch {rows} rows");
}
Console.WriteLine("Finished Generating Data");
var results = db.SomeEntities.AsNoTracking().Include(e => e.Address).AsEnumerable();
int batchSize = 10 * 1000;
int ix = 0;
foreach (var r in results)
{
ix++;
if (ix % batchSize == 0)
{
Console.WriteLine($"Read Entity {ix} with name {r.Name}. Current Memory: {GC.GetTotalMemory(false) / 1024}kb GC's Gen0:{GC.CollectionCount(0)} Gen1:{GC.CollectionCount(1)} Gen2:{GC.CollectionCount(2)}");
}
}
Console.WriteLine($"Done. Current Memory: {GC.GetTotalMemory(false)/1024}kb");
Console.ReadKey();
}
}
}
}
Outputs
Generating Data ...
Loaded Addresses
Loaded Entity Batch 1000000 rows
Loaded Entity Batch 1000000 rows
. . .
Loaded Entity Batch 1000000 rows
Finished Generating Data
Read Entity 10000 with name EntityName10000. Current Memory: 2854kb GC's Gen0:7 Gen1:1 Gen2:0
Read Entity 20000 with name EntityName20000. Current Memory: 4158kb GC's Gen0:14 Gen1:1 Gen2:0
Read Entity 30000 with name EntityName30000. Current Memory: 2446kb GC's Gen0:22 Gen1:1 Gen2:0
. . .
Read Entity 9990000 with name EntityName990000. Current Memory: 2595kb GC's Gen0:7429 Gen1:9 Gen2:1
Read Entity 10000000 with name EntityName1000000. Current Memory: 3908kb GC's Gen0:7436 Gen1:9 Gen2:1
Done. Current Memory: 3916kb
Note, another common cause for excessive memory consumption in EF Core is "Mixed client/server evaluation" of queries. See the docs for more info and how to disable automatic client-side query evaluation.
This was due to MARS (Multiple multiple active result sets being disabled).
https://github.com/aspnet/EntityFrameworkCore/issues/9367
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With