Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Query minimizing/caching in a C++ application

I'm writing a project in C++/Qt and it is able to connect to any type of SQL database supported by the QtSQL (http://doc.qt.nokia.com/latest/qtsql.html). This includes local servers and external ones.

However, when the database in question is external, the speed of the queries starts to become a problem (slow UI, ...). The reason: Every object that is stored in the database is lazy-loaded and as such will issue a query every time an attribute is needed. On average about 20 of these objects are to be displayed on screen, each of them showing about 5 attributes. This means that for every screen that I show about 100 queries get executed. The queries execute quite fast on the database server itself, but the overhead of the actual query running over the network is considerable (measured in seconds for an entire screen).

I've been thinking about a few ways to solve the issue, the most important approaches seem to be (according to me):

  1. Make fewer queries
  2. Make queries faster

Tackling (1)

  • I could find some sort of way to delay the actual fetching of the attribute (start a transaction), and then when the programmer writes endTransaction() the database tries to fetch everything in one go (with SQL UNION or a loop...). This would probably require quite a bit of modification to the way the lazy objects work but if people comment that it is a decent solution I think it could be worked out elegantly. If this solution speeds up everything enough then an elaborate caching scheme might not even be necessary, saving a lot of headaches
  • I could try pre-loading attribute data by fetching it all in one query for all the objects that are requested, effectively making them non-lazy. Of course in that case I will have to worry about stale data. How would I detect stale data without at least sending one query to the external db? (Note: sending a query to check for stale data for every attribute check would provide a best-case 0x performance increase and a worst-caste 2x performance decrease when the data is actually found to be stale)

Tackling (2)

Queries could for example be made faster by keeping a local synchronized copy of the database running. However I don't really have a lot of possibilities on the client machines to run for example exactly the same database type as the one on the server. So the local copy would for example be an SQLite database. This would also mean that I couldn't use an db-vendor specific solution. What are my options here? What has worked well for people in these kinds of situations?

Worries

My primary worries are:

  • Stale data: there are plenty of queries imaginable that change the db in such a way that it prohibits an action that would seem possible to a user with stale data.
  • Maintainability: How loosely can I couple in this new layer? It would obviously be preferable if it didn't have to know everything about my internal lazy object system and about every object and possible query

Final question

What would be a good way to minimize the cost of making a query? Good meaning some sort of combination of: maintainable, easy to implement, not too aplication specific. If it comes down to pick any 2, then so be it. I'd like to hear people talk about their experiences and what they did to solve it.

As you can see, I've thought of some problems and ways of handling it, but I'm at a loss for what would constitute a sensible approach. Since it will probable involve quite a lot of work and intensive changes to many layers in the program (hopefully as few as possible), I thought about asking all the experts here before making a final decision on the matter. It is also possible I'm just overlooking a very simple solution, in which case a pointer to it would be much appreciated!

Assuming all relevant server-side tuning has been done (for example: MySQL cache, best possible indexes, ...)

*Note: I've checked questions of users with similar problems that didn't entirely satisfy my question: Suggestion on a replication scheme for my use-case? and Best practice for a local database cache? for example)

If any additional information is necessary to provide an answer, please let me know and I will duly update my question. Apologies for any spelling/grammar errors, english is not my native language.

Note about "lazy"

A small example of what my code looks like (simplified of course):

QList<MyObject> myObjects = database->getObjects(20, 40); // fetch and construct object 20 to 40 from the db

// ...some time later

// screen filling time!
foreach (const MyObject& o, myObjects) {
    o->getInt("status", 0);  // == db request
    o->getString("comment", "no comment!"); // == db request
    // about 3 more of these
}
like image 477
Aktau Avatar asked Nov 05 '22 17:11

Aktau


1 Answers

At first glance it looks like you have two conflicting goals: Query speed, but always using up-to-date data. Thus you should probably fall back to your needs to help decide here.

1) Your database is nearly static compared to use of the application. In this case use your option 1b and preload all the data. If there's a slim chance that the data may change underneath, just give the user an option to refresh the cache (fully or for a particular subset of data). This way the slow access is in the hands of the user.

2) The database is changing fairly frequently. In this case "perhaps" an SQL database isn't right for your needs. You may need a higher performance dynamic database that pushes updates rather than requiring a pull. That way your application would get notified when underlying data changed and you would be able to respond quickly. If that doesn't work however, you want to concoct your query to minimize the number of DB library and I/O calls. For example if you execute a sequence of select statements your results should have all the appropriate data in the order you requested it. You just have to keep track of what the corresponding select statements were. Alternately if you can use a looser query criteria so that it returns more than one row for your simple query that ought to help performance as well.

like image 160
Mark B Avatar answered Nov 12 '22 19:11

Mark B