Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading thousands of objects with EF Core FAST

I am reading 40,000 small objects / rows from SQLite with EF core, and it's taking 18 seconds, which is too long for my UWP app. When this happens CPU usage on a single core reaches 100%, but the disk reading speed is circa 1%.

var dataPoints =  _db.DataPoints.AsNoTracking().ToArray();

Without AsNoTracking() the time taken is even longer.

DataPoint is a small POCO with a few primitive properties. Total amount of data I am loading is 4.5 MB.

    public class DataPointDto
    {
        [Key]
        public ulong Id { get; set; }

        [Required]
        public DateTimeOffset TimeStamp { get; set; }

        [Required]
        public bool trueTime { get; set; }

        [Required]
        public double Value { get; set; }
   }

Question: Is there a better way of loading this many objects, or am I stuck with this level of performance?

Fun fact: x86 takes 11 seconds, x64 takes 18. 'Optimise code' shaves off a second. Using Async pushes execution time to 30 seconds.

like image 433
Vladimir Akopyan Avatar asked Mar 07 '16 21:03

Vladimir Akopyan


People also ask

What is the best way to write EF Core code?

I make a living around writing EF Core code quickly, and that performs well (I wrote the book, “ Entity Framework Core in Action ”) and my approach is to use the LINQ Select method (quick to run) and the AutoMapper library (quick to write).

How to read data from database in Entity Framework Core?

Reading data from Database in Entity Framework Core is quite easy. The below code will fetch the employee with name as Matt. This tutorial is a part of Entity Framework Core series. If you want to read all the records of a particular table then use the ToListAsync () method. The below code reads all the Employee table records.

How does EF behave when table has around ~1000 records?

EF pulls all the records into memory then performs the query operation. How EF would behave when table has around ~1000 records? For simple edit I have to pull the record edit it and then push to db using SaveChanges () 1) EF doesn't pull all the records into memory to do a query. 2) Okay... what does that have to do with large-scale applications?

How to avoid Cartesian explosion in EF Core?

This will make EF Core fetch all the Blogs - along with their Posts - in a single query. In some cases, it may also be useful to avoid cartesian explosion effects by using split queries. Because lazy loading makes it extremely easy to inadvertently trigger the N+1 problem, it is recommended to avoid it.


2 Answers

Most answers follow the common wisdom of loading less data, but in some circumstances such as here you Absolutely Positively Must load a lot of entities. So how do we do that?

Cause of poor performance

Is it unavoidable for this operation to take this long? Well, its not. We are loading just a megabyte of data from disk, the cause of poor performance is that the data is split across 40,000 tiny entities. The database can handle that, but the entity framework seem to struggle setting up all those entities, change tracking, etc. If we do not intend to modify the data, there is a lot we can do.

I tried three things

Primitives

Load just one property, then you get a list of primitives.

List<double> dataPoints =  _db.DataPoints.Select(dp => dp.Value).ToList();

This bypasses all of entity creation normally performed by entity framework. This query took 0.4 seconds, compared to 18 seconds for the original query. We are talking 45 (!) times improvement.

Anonymous types

Of-course most of the time we need more than just an array of primitives We can create new objects right inside the LINQ query. Entity framework won't create the entities it normally would, and the operation runs much faster. We can use anonymous objects for convenience.

var query = db.DataPoints.Select(dp => new {Guid ID = dp.sensorID, DateTimeOffset Timestamp = dp.TimeStamp, double Value = dp.Value});

This operations takes 1.2 seconds compared to 18 seconds for normally retrieving the same amount of data.

Tuples

I found that in my case using Tuples instead of anonymous types improves performance a little, the following query executed roughly 30% faster:

var query = db.DataPoints.Select(dp => Tuple.Create(dp.sensorID, dp.TimeStamp, dp.Value));

Other ways

  1. You cannot use structures inside LinQ queries, so that's not an option
  2. In many cases you can combine many records together to reduce overhead associated with retrieving many individual records. By retrieving fewer larger records you could improve performance. For instance in my usecase I've got some measurements that are being taken every 5 minutes, 24/7. At the moment I am storing them individually, and that's silly. Nobody will ever query less than a day worth of them. I plan to update this post when I make the change and find out how performance changed.
  3. Some recommend using an object oriented DB or micro ORM. I have never used either, so I can't comment.
like image 62
Vladimir Akopyan Avatar answered Sep 23 '22 00:09

Vladimir Akopyan


you can use a different technique to load all your items.

you can create your own logic to load parts of the data while the user is scrolling the ListView( I guess you are using it) .

fortunately UWP a easy way to do this technique. Incremental loading please see the documentation and example

https://msdn.microsoft.com/library/windows/apps/Hh701916

like image 38
RicardoPons Avatar answered Sep 21 '22 00:09

RicardoPons