I'm trying to pull a large-ish dataset (1.4 million records) from a SQL Server and dump to a file in a WinForms application. I've attempted to do it with paging, so that I'm not holding too much in memory at once, but the process continues to grow it's memory footprint as it runs. About 25% through, it was taking up 600,000K. Am I doing the paging wrong? Can I get some suggestions on how to keep the memory usage from growing so much?
var query = (from organizations in ctxObj.Organizations
where organizations.org_type_cd == 1
orderby organizations.org_ID
select organizations);
int recordCount = query.Count();
int skipTo = 0;
int take = 1000;
if (recordCount > 0)
{
while (skipTo < recordCount)
{
if (skipTo + take > recordCount)
take = recordCount - skipTo;
foreach (Organization o in query.Skip(skipTo).Take(take))
{
writeRecord(o);
}
skipTo += take;
}
}
Simply Convert the int64 values as int8 and float64 as float8. This will reduce memory usage.
Another way to handle large datasets is by chunking them. That is cutting a large dataset into smaller chunks and then processing those chunks individually. After all the chunks have been processed, you can compare the results and calculate the final findings.
When data is too large to fit into memory, you can use Pandas' chunksize option to split the data into chunks instead of dealing with one big block.
The object context will keep on objects in memory until it's disposed. I would recommend disposing the context after each batch to prevent the memory footprint from continuing to grow.
You can also use AsNoTracking()
(http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx) since you are not saving back to the database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With