Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the pros and cons of performing calculations in sql vs. in your application

People also ask

Is SQL good for calculations?

The power and expressivity of SQL is heavily underestimated. Since the introduction of window functions, a lot of non-strictly set-oriented calculations can be performed very easily and elegantly in the database.

Can a database perform calculations?

Field calculations can be performed in a database by adding a formula field. Field calculations allow you to perform addition, subtraction, multiplication, and division operations on any numeric fields in your database.

How is working with data in spreadsheets and with SQL different How are they similar?

As an example, Excel cells let you store strings of information like “January 3, 2020,” whereas relational databases using SQL would divide the date into three separate cells. Excel is more visual than SQL—instead of writing commands, users get to directly work with and manipulate their data in tables.


It depends on a lot of factors - but most crucially:

  • complexity of calculations (prefer doing complex crunching on an app-server, since that scales out; rather than a db server, which scales up)
  • volume of data (if you need to access/aggregate a lot of data, doing it at the db server will save bandwidth, and disk io if the aggregates can be done inside indexes)
  • convenience (sql is not the best language for complex work - especially not great for procedural work, but very good for set-based work; lousy error-handling, though)

As always, if you do bring the data back to the app-server, minimising the columns and rows will be to your advantage. Making sure the query is tuned and appropriately indexed will help either scenario.

Re your note:

and then loop through the records

Looping through records is almost always the wrong thing to do in sql - writing a set-based operation is preferred.

As a general rule, I prefer to keep the database's job to a minimum "store this data, fetch this data" - however, there are always examples of scenarios where an elegant query at the server can save a lot of bandwidth.

Also consider: if this is computationally expensive, can it be cached somewhere?

If you want an accurate "which is better"; code it both ways and compare it (noting that a first draft of either is likely not 100% tuned). But factor in typical usage to that: if, in reality, it is being called 5 times (separately) at once, then simulate that: don't compare just a single "1 of these vs 1 of those".


Let me use a metaphor: if you want to buy a golden necklace in Paris, the goldsmith could sit in Cape Town or Paris, that is a matter of skill and taste. But you would never ship tons of gold ore from South Africa to France for that. The ore is processed at the mining site (or at least in the general area), only the gold gets shipped. The same should be true for apps and databases.

As far as PostgreSQL is concerned, you can do almost anything on the server, quite efficiently. The RDBMS excels at complex queries. For procedural needs you can choose from a variety of server-side script languages: tcl, python, perl and many more. Mostly I use PL/pgSQL, though.

Worst case scenario would be to repeatedly go to the server for every single row of a larger set. (That would be like shipping one ton of ore a time.)

Second in line, if you send a cascade of queries, each depending on the one before, while all of it could be done in one query or procedure on the server. (That's like shipping the gold, and each of the jewels with a separate ship, sequentially.)

Going back and forth between app and server is expensive. For server and client. Try to cut down on that, and you will win - ergo: use server side procedures and / or sophisticated SQL where necessary.

We just finished a project where we packed almost all complex queries into Postgres functions. The app hands over parameters and gets the datasets it needs. Fast, clean, simple (for the app developer), I/O reduced to a minimum ... a shiny necklace with a low carbon footprint.


In this case you are probably slightly better off doing the calculation in SQL as the database engine is likely to have a more efficient decimal arithmetic routines than Java.

Generally though for row level calculations there is not much difference.

Where it does make a difference is:

  • Aggregate calculations like SUM(), AVG(),MIN(), MAX() here the database engine will be an order of magnitude faster than a Java implementation.
  • Anywhere the calculation is used to filter rows. Filtering at the DB is much more efficient than reading a row and then discarding it.

There's no black / white with respect to what parts of data access logic should be performed in SQL and what parts should be performed in your application. I like Mark Gravell's wording, distinguishing between

  • complex calculations
  • data-intensive calculations

The power and expressivity of SQL is heavily underestimated. Since the introduction of window functions, a lot of non-strictly set-oriented calculations can be performed very easily and elegantly in the database.

Three rules of thumb should always be followed, regardless of the overall application architecture:

  • keep the amount of data transferred between database and application slim (in favour of calculating stuff in the DB)
  • keep the amount of data loaded from the disk by the database slim (in favour of letting the database optimise statements to avoid unnecessary data access)
  • don't push the database to its CPU limits with complex, concurrent calculations (in favour of pulling data into application memory and performing calculations there)

In my experience, with a decent DBA and some decent knowledge about your decent database, you won't run into your DBs CPU limits very soon.

Some further reading where these things are explained:

  • 10 Common Mistakes Java Developers Make When Writing SQL
  • 10 More Common Mistakes Java Developers Make When Writing SQL

In general do things in SQL if there are chances that also other modules or component in same or other projects will need to get those results. an atomic operation done server side is also better because you just need to invoke the stored proc from any db management tool to get final values without further processing.

In some cases this does not apply but when it does it makes sense. also in general the db box has the best hardware and performances.


Whether to perform calculations at the front end or at the backend is very much decided if we can determine our goal in the business implementation. At time java code might perform better than a sql code both well written or it might be vice-versa. But still if confused you can try to determine first -

  1. If you can achieve something straightforward via database sql then better go for it as db will perform much better and do computations there and then with the result fetch. However if the actual computation requires too much calculation from here and there stuff then you can go with the application code. Why? Because scenario's like looping in most cases are not best handled by sql wherease front end languages are better designed for these things.
  2. In case similar calculation is required from many places then obviously placing the calculation code at the db end will be better to keep things at the same place.
  3. If there are lots of calculations to be done to attain the final result via many different queries then also go for db end as you can place the same code in a stored procedure to perform better than retrieving results from backend and then computing them at the front end.

There are many other aspects which you can think before you decide where to place the code. One perception is totally wrong - Everything can be done best in Java (app code) and/or everything is best to be done by the db (sql code).


If you are writing on top of ORM or writing casual low-performance applications, use whatever pattern simplifies the application. If you are writing a high performance application and thinking carefully about scale, you will win by moving processing to data. I strongly advocate moving the processing to the data.

Let's think about this in two steps: (1) OLTP (small number of record) transactions. (2) OLAP (long scans of many records).

In the OLTP case, if you want to be fast (10k - 100k transactions per second), you must remove latch, lock and dead lock contention from the database. This means that you need to eliminate long stalls in transactions: round trips from client to DB to move processing to the client are one such long stall. You can't have long lived transactions (to make read/update atomic) and have very high throughput.

Re: horizontal scaling. Modern databases scale horizontally. Those systems implement HA and fault tolerance already. Leverage that and try to simplify your application space.

Let's look at OLAP -- in this case it should be obvious that dragging possibly terrabytes of data back to the application is a horrible idea. These systems are built specifically to operate extremely efficiently against compressed, pre-organized columnar data. Modern OLAP systems also scale horizontally and have sophisticated query planners that disperse work horizontally (internally moving processing to data).