Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing performance of flexible Linq to Entity criteria when retrieving rows from huge database

I have the DB that contains billions of rows. I created function that recieve from user number of parameters and cut the DB by those parameters. This works well for me with small DB(30000 rows), but when I try to use this function on big DB I got TIMEOUTEXCEPTION from SQLSERVER.

Here is my code:

public static IQueryable<LogViewer.EF.InternetEF.Log> ExecuteInternetGetLogsQuery(FilterCriteria p_Criteria, ref GridView p_Datagrid)
{
    IQueryable<LogViewer.EF.InternetEF.Log> internetQuery = null;

    using (InternetDBConnectionString context = new InternetDBConnectionString())
    {
        internetQuery = context.Logs;
        if ((p_Criteria.DateTo != null && p_Criteria.DateFrom != null))
        {
            internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom && c.Timestamp < p_Criteria.DateTo);
        }
        else if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
        {
            internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
        }
        else if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
        {
            internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
        }
        if (!string.IsNullOrEmpty(p_Criteria.FreeText))
        {
            internetQuery = internetQuery.Where(c => c.FormattedMessage.Contains(p_Criteria.FreeText));
        }

        if (p_Criteria.Titles.Count > 0)
        {
            internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Titles.Contains(c.Title)).AsQueryable();
        }
        if (p_Criteria.MachineNames.Count > 0)
        {
            internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.MachineNames.Contains(c.MachineName)).AsQueryable();
        }
        if (p_Criteria.Severities.Count > 0)
        {
            internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity)).AsQueryable();
        }
        internetQuery= internetQuery.OrderByDescending(c=>c.LogID);
        if (internetQuery.Count() > p_Criteria.TopValue)
        {
            internetQuery = internetQuery.Take(p_Criteria.TopValue);
        }
        p_Datagrid.DataSource = internetQuery;
        p_Datagrid.DataBind();
        return internetQuery;

    }  
}

My version of SQL is 2005. I got an exception on p_Datagrid.DataBind(); row.

Any suggetions? Thanks

like image 804
AlexBerd Avatar asked Apr 17 '12 12:04

AlexBerd


People also ask

Is LINQ good for performance?

LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code. And if you'd like to measure the performance difference, you can use a tool like BenchmarkDotNet to do so.


2 Answers

What I can see you have these options:

  • Increase the timeout (Bad idé just moves the problem in the future)
  • Instead of doing a linq query. Get the data by a store procedure
  • Make the grid page. So you just retrieve the data for target page.
  • Look at the query plan and see if you can do any indexes on the column that you are doing your where statements on and order by.
  • Why do you need to have billions of rows in a datagrid. What is the requirements? Maybe you can just show top 1000 or top 10000. Because from a user respective I can not see any pros of seeing a grid with a billion rows.

That was just top of my head.

EDIT

And if I would have this function I would start looking at this section of the code:

if (p_Criteria.Titles.Count > 0)
{
     internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Titles.Contains(c.Title)).AsQueryable();
}
if (p_Criteria.MachineNames.Count > 0)
{
      internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.MachineNames.Contains(c.MachineName)).AsQueryable();
}
if (p_Criteria.Severities.Count > 0)
{
      internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity)).AsQueryable();
}

This actually make a IEnumerable of the result and then you do the in memory where statements with database calls. You might also have a problem doing this because when you call the related tables it call the database. Maybe you can fetch the rows and then do the contains with a id of the IQueryable . All the pros of having a IQueryable diapers when doing this.

like image 106
Arion Avatar answered Oct 29 '22 13:10

Arion


In general, 'swiss army knife' specification or criteria patterns like this are hard to optimise (i.e. Index at SQL Level), because of the large number of permutations of filter combinations that a client / user can specify. So if you can somehow force the user to specify at least one mandatory criterion, which reduces the rowcount significantly, e.g. by making the Date Range mandatory and no more than one month, I would start there, because then at least we've got something to start when we look at indexing.

Due to the potentially large number of rows, I would assert or validate that the value of p_Criteria.TopValue used to limit the rows is always present, and is a sensible number, e.g. Take(1000). You can always warn the user to narrow his / her search range if this threshold is reached.

The major problem is likely to be that filtering on Titles, MachineNames and Severities each calls AsEnumerable(), which materializes the query thus far, and thus you evaluate these 3 filters in memory, not in SQL, potentially with large numbers of records. All recent versions of EF are able to convert predicates of the form Where(c => IEnumerable<X>.Contains(c.Column)) into the SQL WHERE c.Column IN (X1, X2, X3).

i.e. You should remove the AsEnumerable() on these 3 filters (and you then don't need to convert back to AsQueryable()), i.e.

    if (p_Criteria.Titles.Any())
    {
        internetQuery = internetQuery
            .Where(c => p_Criteria.Titles.Contains(c.Title));
    }
    if (p_Criteria.MachineNamesAny())
    {
        internetQuery = internetQuery
            .Where(c => p_Criteria.MachineNames.Contains(c.MachineName));
    }
    if (p_Criteria.Severities.Any())
    {
        internetQuery = internetQuery
            .Where(c => p_Criteria.Severities.Contains(c.Severity));
    }

Another issue in the Take check, by running .Count() in the check, that you are materializing the query (if you haven't already done so). You should instead just run Take() directly - no need to check if we've exceeded the rowcount. If there are LESS than p_Criteria.TopValue rows then it will return as may rows as are present, i.e. remove the if check and just leave this:

internetQuery = internetQuery.Take(p_Criteria.TopValue);

Another thing I would look at for performance reasons is whether you can change the FreeText string checks to use StartsWith instead of Contains. Indexing on SQL database char columns is only effective at the start of the string. If the %filter% wildcard is not needed, then this is obviously different to OP's code, but will use be able to use indexing on the FreeText column:

if (!string.IsNullOrEmpty(p_Criteria.FreeText))
{
    internetQuery = internetQuery
        .Where(c => c.FormattedMessage.StartsWith(p_Criteria.FreeText));
}

Minor quibble, and won't effect the database performance, but you can reduce the number of branches on your date filtering to just the following:

if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
{
    internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
}
if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
{
    internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
}

From a naming standards point of view, I would also change the name of your Object/DbContext from *ConnectionString to *Context.

like image 22
StuartLC Avatar answered Oct 29 '22 13:10

StuartLC