I have the DB that contains billions of rows.
I created function that recieve from user number of parameters and cut the DB by those parameters.
This works well for me with small DB(30000 rows), but when I try to use this function on big DB I got TIMEOUTEXCEPTION from SQLSERVER
.
Here is my code:
public static IQueryable<LogViewer.EF.InternetEF.Log> ExecuteInternetGetLogsQuery(FilterCriteria p_Criteria, ref GridView p_Datagrid)
{
IQueryable<LogViewer.EF.InternetEF.Log> internetQuery = null;
using (InternetDBConnectionString context = new InternetDBConnectionString())
{
internetQuery = context.Logs;
if ((p_Criteria.DateTo != null && p_Criteria.DateFrom != null))
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom && c.Timestamp < p_Criteria.DateTo);
}
else if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
}
else if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
}
if (!string.IsNullOrEmpty(p_Criteria.FreeText))
{
internetQuery = internetQuery.Where(c => c.FormattedMessage.Contains(p_Criteria.FreeText));
}
if (p_Criteria.Titles.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Titles.Contains(c.Title)).AsQueryable();
}
if (p_Criteria.MachineNames.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.MachineNames.Contains(c.MachineName)).AsQueryable();
}
if (p_Criteria.Severities.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity)).AsQueryable();
}
internetQuery= internetQuery.OrderByDescending(c=>c.LogID);
if (internetQuery.Count() > p_Criteria.TopValue)
{
internetQuery = internetQuery.Take(p_Criteria.TopValue);
}
p_Datagrid.DataSource = internetQuery;
p_Datagrid.DataBind();
return internetQuery;
}
}
My version of SQL is 2005.
I got an exception on p_Datagrid.DataBind();
row.
Any suggetions? Thanks
LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code. And if you'd like to measure the performance difference, you can use a tool like BenchmarkDotNet to do so.
What I can see you have these options:
where
statements on and order by
.top 1000
or top 10000
. Because from a user respective I can not see any pros of seeing a grid with a billion rows.That was just top of my head.
EDIT
And if I would have this function I would start looking at this section of the code:
if (p_Criteria.Titles.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Titles.Contains(c.Title)).AsQueryable();
}
if (p_Criteria.MachineNames.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.MachineNames.Contains(c.MachineName)).AsQueryable();
}
if (p_Criteria.Severities.Count > 0)
{
internetQuery = internetQuery.AsEnumerable().Where(c => p_Criteria.Severities.Contains(c.Severity)).AsQueryable();
}
This actually make a IEnumerable of the result and then you do the in memory where
statements with database calls. You might also have a problem doing this because when you call the related tables it call the database. Maybe you can fetch the rows and then do the contains
with a id of the IQueryable
. All the pros of having a IQueryable
diapers when doing this.
In general, 'swiss army knife' specification or criteria patterns like this are hard to optimise (i.e. Index at SQL Level), because of the large number of permutations of filter combinations that a client / user can specify. So if you can somehow force the user to specify at least one mandatory criterion, which reduces the rowcount significantly, e.g. by making the Date Range mandatory and no more than one month, I would start there, because then at least we've got something to start when we look at indexing.
Due to the potentially large number of rows, I would assert or validate that the value of p_Criteria.TopValue
used to limit the rows is always present, and is a sensible number, e.g. Take(1000)
. You can always warn the user to narrow his / her search range if this threshold is reached.
The major problem is likely to be that filtering on Titles
, MachineNames
and Severities
each calls AsEnumerable()
, which materializes the query thus far, and thus you evaluate these 3 filters in memory, not in SQL, potentially with large numbers of records. All recent versions of EF are able to convert predicates of the form Where(c => IEnumerable<X>.Contains(c.Column))
into the SQL WHERE c.Column IN (X1, X2, X3)
.
i.e. You should remove the AsEnumerable()
on these 3 filters (and you then don't need to convert back to AsQueryable()
), i.e.
if (p_Criteria.Titles.Any())
{
internetQuery = internetQuery
.Where(c => p_Criteria.Titles.Contains(c.Title));
}
if (p_Criteria.MachineNamesAny())
{
internetQuery = internetQuery
.Where(c => p_Criteria.MachineNames.Contains(c.MachineName));
}
if (p_Criteria.Severities.Any())
{
internetQuery = internetQuery
.Where(c => p_Criteria.Severities.Contains(c.Severity));
}
Another issue in the Take
check, by running .Count()
in the check, that you are materializing the query (if you haven't already done so). You should instead just run Take()
directly - no need to check if we've exceeded the rowcount. If there are LESS than p_Criteria.TopValue
rows then it will return as may rows as are present, i.e. remove the if check and just leave this:
internetQuery = internetQuery.Take(p_Criteria.TopValue);
Another thing I would look at for performance reasons is whether you can change the FreeText string checks to use StartsWith
instead of Contains
. Indexing on SQL database char columns is only effective at the start of the string.
If the %filter%
wildcard is not needed, then this is obviously different to OP's code, but will use be able to use indexing on the FreeText
column:
if (!string.IsNullOrEmpty(p_Criteria.FreeText))
{
internetQuery = internetQuery
.Where(c => c.FormattedMessage.StartsWith(p_Criteria.FreeText));
}
Minor quibble, and won't effect the database performance, but you can reduce the number of branches on your date filtering to just the following:
if (p_Criteria.DateFrom != null && p_Criteria.DateFrom > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp >= p_Criteria.DateFrom);
}
if (p_Criteria.DateTo != null && p_Criteria.DateTo > DateTime.MinValue)
{
internetQuery = internetQuery.Where(c => c.Timestamp < p_Criteria.DateTo);
}
From a naming standards point of view, I would also change the name of your Object/DbContext from *ConnectionString
to *Context
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With