Entity model .net querying 1 million records from MySQL performance issues

Tags:

I am using an ADO .Net Entity Model for querying a MySQL database. I was very happy about its implementation and usage. I decided to see what would happen if I queried 1 million records and it has serious performance issues, and I don't understand why.

The system hangs for sometime and then I get either

A deadlock exception
MySQL Exception

My code is as follows::

      try
        {
            // works very fast
            var data = from employees in dataContext.employee_table
                            .Include("employee_type")
                            .Include("employee_status")
                       orderby employees.EMPLOYEE_ID descending                            
                       select employees; 

            // This hangs the system and causes some deadlock exception
            IList<employee_table> result = data.ToList<employee_table>(); 

            return result;
       }
       catch (Exception ex)
       {
            throw new MyException("Error in fetching all employees", ex);
       }

My question is why is ToList() taking such a long time?

Also how can I avoid this exception and what is the ideal way to query a million records?

849

asked Mar 13 '11 15:03

Gurucharan Balakuntla Maheshku

1 Answers

The ideal way to query a million records would be to use a IQueryable<T> to make sure that you actually aren't executing a query on the database until you need the actual data. I highly doubt that you need a million records at once.

The reason that it is deadlocking is that you are asking the MySQL server to pull those million records from the database then sort then by the EMPLOYEE_ID and then for your program to return that back to you. So I imagine that the deadlocks are from your program waiting for that to finish, and for your program to read that into memory. The MySQL problems are probably related to timeout issues.

The reason that the var data section works quickly is because you actually haven't done anything yet, you've just constructed the query. when you call ToList() then all of the SQL and reading of the SQL is executed. This is what is known as Lazy Loading.

I would suggest try this as follows:

        var data = from employees in dataContext.employee_table
                        .Include("employee_type")
                        .Include("employee_status")
                   orderby employees.EMPLOYEE_ID descending                            
                   select employees;

Then when you actually need something from the list just call

data.Where(/* your filter expression */).ToList()

So if you needed the employee with ID 10.

var employee = data.Where(e => e.ID == 10).ToList();

Or if you need all the employees that last names start with S (I don't know if your table has a last name column, just an example).

var employees = data.Where(e => e.LastName.StartsWith("s")).ToList();

Or if you want to page through all of the employees in chunks of 100

var employees = data.Skip(page * 100).Take(100).ToList();

If you want to defer your database calls even further, you can not call ToList() and just use the iterator when you need it. So let's say you want to add up all of the salaries of the people that have a name starting with A

 var salaries = data.Where(s => s.LastName.StartsWith("A"))

 foreach(var employee in salaries)
 {
     salaryTotal += employee.Salary;
 }

This would only do a query that would look something like

Select Salary From EmployeeTable Where ID = @ID

Resulting in a very fast query that is only getting the information when you need it and only just the information that you need.

If for some crazy reason you wanted to actually query all the million records for the database. Ignoring the fact that this would eat up a massive amount of system resources I would suggest doing this in chunks, you would probably need to play around with the chunk size to get the best performance.

The general idea is to do smaller queries to avoid timeout issues from the database.

int ChunkSize = 100; //for example purposes
HashSet<Employee> Employees - new HashSet<Employee>;

//Assuming it's exactly 1 Million records

int RecordsToGet = 1000000;

for(record = 0; record <= RecordsToGet; record += ChunkSize)
{
    dataContext.EmployeeTable.Skip(record).Take(ChunkSize).ForEach(e => HashSet.Add(e));
}

I chose to use a HashSet<T> since they are designed for large sets of data, but I don't know what performance would look like a 1,000,000 objects.

166

answered Nov 15 '22 09:11

msarchet

Related questions
                            
                                Zip subfolders using ZipOutputStream
                            
                                How to get the full type name?
                            
                                Try - Catch return strategy
                            
                                Is there any way to check that a type is a type of enumeration?
                            
                                Sort List<String[]>
                            
                                Post to Facebook user wall using Facebook.dll in WP7
                            
                                Guaranteeing request came from local server
                            
                                Best practice for attempting to edit an item that doesn't exist?
                            
                                How to schedule tasks using Quartz.Net inside a Windows Service?
                            
                                Are call-by-value and pass-by-value synonymous?
                            
                                Lazy-loaded NHibernate properties in Equals and GetHashCode
                            
                                Determining the number of lines in a text string?
                            
                                Book recommendation - Parallel programming for C# .NET 4.0 [closed]
                            
                                Read specific div from HttpResponse
                            
                                Problem with assembly references in Visual Studio
                            
                                image in database
                            
                                Operator Overloading causes a stack overflow
                            
                                LINQ union with optional null second parameter
                            
                                How to invoke UpdateSource for all bindings on the form?
                            
                                How to use a SQL connection string with ADO.NET Entity Data Model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Entity model .net querying 1 million records from MySQL performance issues

Tags:

c#

mysql

entity-framework

ado.net

Gurucharan Balakuntla Maheshku

People also ask

1 Answers

msarchet

Recent Activity

Donate For Us