Suppose I have this kind of query
String sql = "SELECT s.team_id, s.team_name, s.gp, s.w, s.t, s.l, s.go, s.ga, s.score, s.p FROM "
+ "(SELECT team_id, team_name, SUM (gp) gp, SUM (w) w, SUM (t) t, SUM (l) l, SUM (GO) go, SUM (GA) ga, SUM (GO)- SUM (GA) score, SUM (2*w+t) p FROM "
+ "(SELECT t._id team_id, t.name team_name, COUNT(CASE WHEN score_home IS NOT NULL THEN 1 END) gp, COUNT (CASE WHEN score_home > score_away THEN 1 END) w,"
+ " COUNT (CASE WHEN score_home = score_away THEN 1 END) t, COUNT (CASE WHEN score_home < score_away THEN 1 END) l,"
+ " SUM (score_home) go, SUM (score_away) ga"
+ " FROM team_table t LEFT OUTER JOIN match_table m ON m.team_home = t._id"
+ " WHERE t.tournament_id = ? GROUP BY t._id, t.name"
+ " UNION ALL"
+ " SELECT t._id team_id, t.name team_name, COUNT(CASE WHEN score_away IS NOT NULL THEN 1 END) gp, COUNT (CASE WHEN score_home < score_away THEN 1 END) w,"
+ " COUNT (CASE WHEN score_home = score_away THEN 1 END) t, COUNT (CASE WHEN score_home > score_away THEN 1 END) l,"
+ " SUM (score_away) go, SUM (score_home) ga"
+ " FROM team_table t LEFT OUTER JOIN match_table m ON m.team_away = t._id"
+ " WHERE t.tournament_id = ? GROUP BY t._id, t.name)"
+ " GROUP BY team_id, team_name) s"
+ " ORDER BY s.p DESC, s.score DESC, s.go ASC";
which is then used like this
Cursor cursor = database.rawQuery(sql, args);
cursor.moveToFirst();
while (!cursor.isAfterLast()) {
TeamStats stat = new TeamStats();
stat.setTeamId(cursor.getLong(0));
stat.setTeamName(cursor.getString(1));
stat.setGamesPlayed(cursor.getInt(2));
stat.setWins(cursor.getInt(3));
stat.setTies(cursor.getInt(4));
stat.setLoses(cursor.getInt(5));
stat.setGoalsOwn(cursor.getInt(6));
stat.setGoalsAgaist(cursor.getInt(7));
stat.setScore(cursor.getInt(8));
stat.setPoints(cursor.getInt(9));
stats.add(stat);
cursor.moveToNext();
}
cursor.close();
So it selects values from many tables, does some operations etc. As you can see the query is horribly complex (very difficult to debug) and the performance does not seem to be as good as I would expect. My questions are:
If I were you, I would copy your sqlite database to host, then try to execute it manually in some SQLite GUI while replacing bound variables (?
) with actual variable values you have. For GUI on Windows, I really like SQLite Expert Personal, and on Linux sqliteman
is pretty good.
While debugging your SQL (in command line or GUI), be sure to analyze your SQL statements by running them under EXPLAIN
and/or EXPLAIN QUERY PLAN
. Watch out for table scans. You should try to eliminate expensive scans by adding indexes. But don't index everything - it may make things worse.
Often, you can have big performance gains by using compound (multi-column) indexes. Note, that on any given table SQLite cannot make use of more than just one index (while running given SQL statement) - so, choose your indexes wisely. (See also the basic explanation in Query Planning.)
And to address your concerns about data processing in Java vs. SQLite - I think that fully optimized (with proper indexes, etc.) SQLite query against relational data will (almost) always going to be faster than manual processing of this data in Java. This must be especially true in your case - all your data is basically relational.
One small note though: Your Android APK using Java may have access to more memory than SQLite does by default - you may want to increase SQLite cache size for your database using setMaxSqlCacheSize()
(equivalent of PRAGMA cache_size
). Android default is 10 (max 100), try increasing it and see if makes any difference for your query. Note that desktop SQLite default for this setting is much higher - 2000.
First, I don't know much about SQLite but I suppose that it will behave more or less like MS SQL-Server.
Most often, a performance problem for a simple query like this will usually relate to the case of a missing index resulting in a full table scan instead of a partial table scan or a table seek. If you don't have a index on team_table.tournament_id then SQLite will have to scan the whole table to perform the "t.tournament_id = ?" operation. The same thing will happen with match_table.team_home and match_table.team_away: a missing index will result of a full table scan for the join operations on m.team_home and m.team_away to be done.
For the rest, you can simplify your query in two ways. The first will be to drop the outer subquery and use expressions or columns ordering in your Order by; ie., you can replace the "ORDER BY s.p DESC, s.score DESC, s.go ASC" with "ORDER BY SUM (2*w+t) DESC, SUM (GO)- SUM (GA) DESC, SUM (GO) ASC" and get rid of the subquery s.
The second way would be to replace the UNION with a single query by performing the left join operation on both m.team_home and m.team_away at the same time:
... FROM team_table t LEFT OUTER JOIN match_table m ON (m.team_home = t._id or m.team_away = t._id) ...
After that, it's very easy to change your Case statements to properly calculate the various scores on weither t._id is equal to m.team_home or m.team_away. This way, not only you can drop the UNION but you can also drop the second subquery.
Finally, you must take a look at the use of the Left Join; as I'm not sure if it's really required over the use of a regular Inner Join.
After that, you should end up with a simple join Query with a Group By and an Order By and with no subquery or union and possibly with no left join whatsoever. However, at this point, the expressions in the Order By might have become a little complicated so you will have to take the decision of either keeping them this way, putting back a subquery or using columns ordering (my last favorite choice).
Without the Union, the query should perform at least twice as fast but ultimately, to have a good performance, the ultimate requirement will be to have all the proper indexes; otherwise, the performance will never be good if the sql server need to perform multiple full table scannings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With