Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance issues to iterate results with C# SQLite DataReader and attached database

I am using System.Data.SQLite and SQLiteDataReader in my C# project. I am facing performance issues when getting the results of a query with attached databases.

Here is an example of a query to search text into two databases :

ATTACH "db2.db" as db2;

SELECT MainRecord.RecordID,
((LENGTH(MainRecord.Value) - LENGTH(REPLACE(UPPER(MainRecord.Value), UPPER("FirstValueToSearch"), ""))) / 18) AS "FirstResultNumber",
((LENGTH(DB2Record.Value) - LENGTH(REPLACE(UPPER(DB2Record.Value), UPPER("SecondValueToSearch"), ""))) / 19) AS "SecondResultNumber"
FROM main.Record MainRecord
JOIN db2.Record DB2Record ON DB2Record.RecordID BETWEEN (MainRecord.PositionMin) AND (MainRecord.PositionMax)
WHERE FirstResultNumber > 0 AND SecondResultNumber > 0;

DETACH db2;

When executing this query with SQLiteStudio or SQLiteAdmin, this works fine, I am getting the results in a few seconds (the Record table can contain hundreds of thousands of records, the query returns 36000 records).

When executing this query in my C# project, the execution takes a few seconds too, but it takes hours to run through all the results.

Here is my code :

// Attach databases

SQLiteDataReader data = null;

using (SQLiteCommand command = this.m_connection.CreateCommand())
{
    command.CommandText = "SELECT...";
    data = command.ExecuteReader();
}

if (data.HasRows)
{
    while (data.Read())
    {
        // Do nothing, just iterate all results
    }
}

data.Close();

// Detach databases

Calling the Read method of the SQLiteDataReader once can take more than 10 seconds ! I guess this is because the SQLiteDataReader is lazy loaded (and so it doesn't return the whole rowset before reading the results), am I right ?

EDIT 1 :

I don't know if this has something to do with lazy loading, like I said initially, but all I want is being able to get ALL the results as soon as the query is ended. Isn't it possible ? In my opinion, this is really strange that it takes hours to get results of a query executed in few seconds...

EDIT 2 :

I just added a COUNT(*) in my select query in order to see if I could get the total number of results at the first data.Read(), just to be sure that it was only the iteration of the results that was taking so long. And I was wrong : this new request executes in few seconds in SQLiteAdmin / SQLiteStudio, but takes hours to execute in my C# project. Any idea why the same query is so much longer to execute in my C# project?

EDIT 3 :

Thanks to EXPLAIN QUERY PLAN, I noticed that there was a slight difference in the execution plan for the same query between SQLiteAdmin / SQLiteStudio and my C# project. In the second case, it is using an AUTOMATIC PARTIAL COVERING INDEX on DB2Record instead of using the primary key index. Is there a way to ignore / disable the use of automatic partial covering indexes? I know it is used to speed up the queries, but in my case, it's rather the opposite that happens...

Thank you.

like image 350
Morgan M. Avatar asked May 20 '15 10:05

Morgan M.


People also ask

How can loop performance be improved?

The best ways to improve loop performance are to decrease the amount of work done per iteration and decrease the number of loop iterations.

Which is better foreach or for loop C#?

The forloop is faster than the foreach loop if the array must only be accessed once per iteration.

Which is faster foreach or for?

The foreach loop is considered to be much better in performance to that of the generic for loop.

Why does the order of the loops affect performance when iterating over a 2d array?

This is due to cache misses. C multidimensional arrays are stored with the last dimension as the fastest. So the first version will miss the cache on every iteration, whereas the second version won't. So the second version should be substantially faster.


1 Answers

Besides finding matching records, it seems that you're also counting the number of times the strings matched. The result of this count is also used in the WHERE clause.

You want the number of matches, but the number of matches does not matter in the WHERE clause - you could try change the WHERE clause to:

WHERE MainRecord.Value LIKE '%FirstValueToSearch%' AND DB2Record.Value LIKE '%SecondValueToSearch%'

It might not result in any difference though - especially if there's no index on the Value columns - but worth a shot. Indexes on text columns require alot of space, so I wouldn't blindly recommend that.

If you haven't done so yet, place an index on the DB2's RecordID column.

You can use EXPLAIN QUERY PLAN SELECT ... to make SQLite spit out what it does to try to make your query perform, the output of that might help diagnose the problem.

like image 67
C.Evenhuis Avatar answered Nov 02 '22 23:11

C.Evenhuis