Can someone give me a relative idea of when it makes more sense to hit the database many times for small query results vs caching a large number of rows and querying that?
For example, if I have a query returning 2,000 results. And then I have additional queries on those results that take maybe 10-20 items, would it be better to cache the 2000 results or hit the database every time for each set of 10 or 20 results?
Other answers here are correct -- the RDBMS and your data are key factors. However, another key factor is how much time it will take to sort and/or index your data in memory versus in the database. We have one application where, for performance, we added code to grab about 10,000 records into an in-memory DataSet
and then do subqueries on that. As it turns out, keeping that data up to date and selecting out subsets is actually slower than just leaving all the data in the database.
So my advice is: do it the simplest possible way first, then profile it and see if you need to optimize for performance.
It depends on a variety of things. I will list some points that come to mind:
If you have a .Net web app that is caching data in the client, you do not want to pull 2k rows.
If you have a web service, they are almost always better Chunky than Chatty because of the added overhead of XML on the transport.
In a fairly decently normalized and optimized database, there really should be very few times that you have to pull 2k rows out at a time unless you are doing reports.
If the underlying data is changing at a rapid pace, then you should really be careful caching it on the middle tier or the presentation layer because what you present will you will be out of date.
Reports (any DSS) will pull and chomp through much larger data sets, but since they are not interactive, we denormalize and let them have their fun.
In cases of cascading dropdowns and such, AJAX techniques will prove to be more efficient and effective.
I guess I'm not really giving you one answer to your question. "It depends" is the best I can do.
Unless there is a big performance problem (e.g. a highly latent db connection), I'd stick with leaving the data in the database and letting the db take care of things for you. A lot of things are done efficiently on the database level, for example
There are a lot of potential issues you may run into if you do your own caching. You need to have a very good performance reason befor starting to take care of all that complexity.
So, the short answer: It depends, but unless you have some good reasons, this smells like premature optimizaton to me.
in general, network round trip latency is several orders of magnitude greater than the capacity of a database to generate and feed data onto the network, and the capacity of a client box to consume it from a network connection.
But look at the width of your network bus ( Bits/sec ) and compare that to the average round trip time for a database call...
On 100baseT ethernet, for example you are about 12 MBytes / sec data transfer rate. If your average round trip time is say, 200 ms, then your network bus can deliver 3 MBytes in each 200 ms round trip call..
If you're on gigabit ethernet, that number jumps to 30 Mbytes per round trip...
So if you split up a request for data into two round trips, well that's 400 ms, and each query would have to be over 3Mb (or 30Mb for gigibit ) before that would be faster...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With