Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to "warm-up" Entity Framework? When does it get "cold"?

  • What would be the best approach to have high availability on my Entity Framework at anytime?

You can go for a mix of pregenerated views and static compiled queries.

Static CompiledQuerys are good because they're quick and easy to write and help increase performance. However with EF5 it isn't necessary to compile all your queries since EF will auto-compile queries itself. The only problem is that these queries can get lost when the cache is swept. So you still want to hold references to your own compiled queries for those that are occurring only very rare, but that are expensive. If you put those queries into static classes they will be compiled when they're first required. This may be too late for some queries, so you may want to force compilation of these queries during application startup.

Pregenerating views is the other possibility as you mention. Especially, for those queries that take very long to compile and that don't change. That way you move the performance overhead from runtime to compile time. Also this won't introduce any lag. But of course this change goes through to the database, so it's not so easy to deal with. Code is more flexible.

Do not use a lot of TPT inheritance (that's a general performance issue in EF). Neither build your inheritance hierarchies too deep nor too wide. Only 2-3 properties specific to some class may not be enough to require an own type, but could be handled as optional (nullable) properties to an existing type.

Don't hold on to a single context for a long time. Each context instance has its own first level cache which slows down the performance as it grows larger. Context creation is cheap, but the state management inside the cached entities of the context may become expensive. The other caches (query plan and metadata) are shared between contexts and will die together with the AppDomain.

All in all you should make sure to allocate contexts frequently and use them only for a short time, that you can start your application quickly, that you compile queries that are rarely used and provide pregenerated views for queries that are performance critical and often used.

  • In what cases does the Entity Framework gets "cold" again? (Recompilation, Recycling, IIS Restart etc.)

Basically, every time you lose your AppDomain. IIS performs restarts every 29 hours, so you can never guarantee that you'll have your instances around. Also after some time without activity the AppDomain is also shut down. You should attempt to come up quickly again. Maybe you can do some of the initialization asynchronously (but beware of multi-threading issues). You can use scheduled tasks that call dummy pages in your application during times when there are no requests to prevent the AppDomain from dying, but it will eventually.

I also assume when you change your config file or change the assemblies there's going to be a restart.


If you are looking for maximum performance across all calls you should consider your architecture carefully. For instance, it might make sense to pre-cache often used look-ups in server RAM when the application loads up instead of using database calls on every request. This technique will ensure minimum application response times for commonly used data. However, you must be sure to have a well behaved expiration policy or always clear your cache whenever changes are made which affect the cached data to avoid issues with concurrency.

In general, you should strive to design distributed architectures to only require IO based data requests when the locally cached information becomes stale, or needs to be transactional. Any "over the wire" data request will normally take 10-1000 times longer to retrieve than an a local, in memory cache retrieval. This one fact alone often makes discussions about "cold vs. warm data" inconsequential in comparison to the "local vs. remote" data issue.


General tips.

  • Perform rigorous logging including what is accessed and request time.
  • Perform dummy requests when initializing your application to warm boot very slow requests that you pick up from the previous step.
  • Don't bother optimizing unless it's a real problem, communicate with the consumer of the application and ask. Get comfortable having a continuous feedback loop if only to figure out what needs optimization.

Now to explain why dummy requests are not the wrong approach.

  • Less Complexity - You are warming up the application in a manner that will work regardless of changes in the framework, and you don't need to figure out possibly funky APIs/framework internals to do it the right way.
  • Greater Coverage - You are warming up all layers of caching at once related to the slow request.

To explain when a cache gets "Cold".

This happens at any layer in your framework that applies a cache, there is a good description at the top of the performance page.

  • When ever a cache has to be validated after a potential change that makes the cache stale, this could be a timeout or more intelligent (i.e. change in the cached item).
  • When a cache item is evicted, the algorithm for doing this is described in the section "Cache eviction algorithm" in the performance article you linked, but in short.
    • LFRU (Least frequently - recently used) cache on hit count and age with a limit of 800 items.

The other things you mentioned, specifically recompilation and restarting of IIS clear either parts or all of the in memory caches.


As you have stated, use "pre-generated views" that's really all you need to do.

Extracted from your link: "When views are generated, they are also validated. From a performance standpoint, the vast majority of the cost of view generation is actually the validation of the views"

This means the performance knock will take place when you build your model assembly. Your context object will then skip the "cold query" and stay responsive for the duration of the context object life cycle as well as subsequent new object contexts.

Executing irrelevant queries will serve no other purpose than to consume system resources.

The shortcut ...

  1. Skip all that extra work of pre-generated views
  2. Create your object context
  3. Fire off that sweet irrelevant query
  4. Then just keep a reference to your object context for the duration of your process (not recommended).

I have no experience in this framework. But in other contexts, e.g. Solr, completely dummy reads will not be of much use unless you can cache the whole DB (or index).

A better approach would be to log the queries, extract the most common ones out of the logs and use them to warm up. Just be sure not to log the warm up queries or remove them from the logs before proceeding.