Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would reusing a DataContext have a negative performance impact?

After a fair amount of research and some errors, I modified my code so that it creates a new DataContext each time the database is queried or data is inserted. And the database is queried frequently - for each of 250k transactions that are processed, the database is queried to obtain a customer id, department id, and category before the transaction is inserted.

So now I'm trying to optimize the code as it was only processing around 15 transactions a second. I removed some extraneous queries and added some indexes and got it up to 30/sec. I then figured that even though everyone says a DataContext is lightweight, it's got to cost something to create a new one 4 times per transaction, so I tried reusing the DataContext. I found, much to my surprise, that reusing the context caused performance to degrade to 10 transactions a second!

Why would this be the case? Is it because the DataContext caches the entities in memory and first searches through its in-memory list before querying the database? So that if, for example, I'm looking for the customer id (primary key) for the customer with name 'MCS' and the customer name column has a clustered index on it so that the database query is fast, the in-memory lookup will be slower?

And is it true that creating/disposing so many db connections could slow things down, or is this just another premature optimization? And if it is true, is there a way to reuse a DataContext but have it perform an actual database query for each linq-to-sql query?

like image 755
MCS Avatar asked Jul 13 '10 02:07

MCS


3 Answers

Here's why re-using a DataContext is not a best practice, from the MSDN DataContext documentation:

The DataContext is the source of all entities mapped over a database connection. It tracks changes that you made to all retrieved entities and maintains an "identity cache" that guarantees that entities retrieved more than one time are represented by using the same object instance.

In general, a DataContext instance is designed to last for one "unit of work" however your application defines that term. A DataContext is lightweight and is not expensive to create. A typical LINQ to SQL application creates DataContext instances at method scope or as a member of short-lived classes that represent a logical set of related database operations.

If you're re-using a DataContext for a large number of queries, your performance will degrade for a couple of possible reasons:

  1. If DataContext's in-memory identity cache becomes so large that it has to start writing to the pagefile then your performance will be bound to the HD's read-head speed and effectively there won't be a reason to use a cache at all.

  2. The more identity objects there are in memory, the longer each save operation takes.

Essentially what you're doing is violating the UoW principle for the DataContext class.

Opening database connections does have some overhead associated with it, but keeping a connection open for a long period of time (which often also means locking a table) is less preferable than opening and closing them quickly.

Another link which may or may not help you from MSDN:

How to: Reuse a Connection Between an ADO.NET Command and a DataContext (LINQ to SQL)

like image 92
Aaronontheweb Avatar answered Oct 21 '22 16:10

Aaronontheweb


Even with a clustered index, in-memory lookup will always be faster than a database query--except in edge cases, like a 386 vs. a Cray--even if you factor out network-related delays.

I would guess the degradation has to do with the DataContext's handling of entities that it tracks: reusing a context will continually increase the number of tracked entities, and the call to SaveChanges may end up requiring more time.

Again, that's a guess--but it's where I'd start looking.

like image 26
Ben M Avatar answered Oct 21 '22 14:10

Ben M


Not exactly on point here, but have you considered some sort of application-level cache to look up the customer id, department id, and category? It's not clear from your post how many of these entities exist in your system, or what is involved in querying to obtain them.

However, as an example, if you have one million categories in your system, and you need to look up their Id by category name, keeping an name/Id dictionary in memory for lookup at all times will save you a trip to the database for transaction you process. This could massively improve performance (this assumes a few things, like new caregories aren't being added regularly). As a general rule, round trips to the database are expensive relative to in-memory operations.

like image 1
Phil Sandler Avatar answered Oct 21 '22 15:10

Phil Sandler