I'm processing 1 million records in my application, which I retrieve from a MySQL database. To do so I'm using Linq to get the records and use .Skip() and .Take() to process 250 records at a time. For each retrieved record I need to create 0 to 4 Items, which I then add to the database. So the average amount of total Items that has to be created is around 2 million.
IQueryable<Object> objectCollection = dataContext.Repository<Object>();
int amountToSkip = 0;
IList<Object> objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
while (objects.Count != 0)
{
using (dataContext = new LinqToSqlContext(new DataContext()))
{
foreach (Object objectRecord in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
Item item = new Item();
item.Id = Guid.NewGuid();
item.Object = objectRecord.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
amountToSkip += 250;
objects = objectCollection.Skip(amountToSkip).Take(250).ToList();
}
Now the problem arises when creating the Items. When running the application (and not even using dataContext) the memory increases consistently. It's like the items are never getting disposed. Does anyone notice what I'm doing wrong?
Thanks in advance!
Ok I've just discussed this situation with a colleague of mine and we've come to the following solution which works!
int amountToSkip = 0;
var finished = false;
while (!finished)
{
using (var dataContext = new LinqToSqlContext(new DataContext()))
{
var objects = dataContext.Repository<Object>().Skip(amountToSkip).Take(250).ToList();
if (objects.Count == 0)
finished = true;
else
{
foreach (Object object in objects)
{
// Create 0 - 4 Random Items
for (int i = 0; i < Random.Next(0, 4); i++)
{
Item item = new Item();
item.Id = Guid.NewGuid();
item.Object = object.Id;
item.Created = DateTime.Now;
item.Changed = DateTime.Now;
dataContext.InsertOnSubmit(item);
}
}
dataContext.SubmitChanges();
}
// Cumulate amountToSkip with processAmount so we don't go over the same Items again
amountToSkip += processAmount;
}
}
With this implementation we dispose the Skip() and Take() cache everytime and thus don't leak memory!
Ahhh, the good old InsertOnSubmit
memory leak. I've encountered it and bashed my head against the wall many times when trying to load data from large CVS files using LINQ to SQL. The problem is that even after calling SubmitChanges
, the DataContext
continues to track all objects that have been added using InsertOnSubmit
. The solution is to SubmitChanges
after a certain amount of objects, then create a new DataContext
for the next batch. When the old DataContext
is garbage collected, so will all the inserted objects that are tracked by it (and that you no longer require).
"But wait!" you say, "Creating and disposing of many DataContext will have a huge overhead!". Well, not if you create a single database connection and pass it to each DataContext
constructor. That way, a single connection to the database is maintained throughout, and the DataContext
object is otherwise a lightweight object that represents a small work unit and should be discarded after it is complete (in your example, submitting a certain number of records).
My best guess here would be the IQueryable to cause the Memory leak. Maybe there is no appropriate implementation for MySQL of the Take/Skip methods and it's doing the paging in memory? Stranger things have happened, but your loop looks fine. All references should go out of scope and get garbage collected ..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With