Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modularizing SQL even if only syntactic sugar

Is there a way to modularize SQL code so that is more readable and testable?

My SQL code often becomes a long complicated series of nested joins, inner joins, etc. that are hard to write and hard to debug. By contrast, in a procedural language like Javascript or Java, one would pinch off discrete elements as separate functions you would call by name.

Yes, one could write each as entirely separate queries, stored in the database, or as stored procedures, but often I don't want to change/clutter the database, just query it is fine, especially if the DBA doesn't wish to grant write permissions to all users.

For instance, conceptually a complex query might be easily described in pseudocode like this:

(getCustomerProfile) 
left join 
(getSummarizedCustomerTransactionHistory) 
using (customerId) 
left join
(getGeographicalSummaries) 
using (region, vendor)
...

I realize that a lot is written on the topic from a theoretical vantage (a few links below), but I'm just looking for a way to make the code easier to write correctly, and easier to read once written. Perhaps just syntactic sugar to abstract the complexity from sight, if not from execution, that compiles down in the literal SQL I'm trying to not look at. By analogy...

  • Stylus: CSS ::
  • CoffeeScript : Javascript ::
  • SAS Macro language: SAS language ::
  • ? : SQL

And if the specific SQL flavor matters, most of my work is in PostgresQL.

http://lambda-the-ultimate.org/node/2440

Code reuse and modularity in SQL

Are Databases and Functional Programming at odds?

like image 500
prototype Avatar asked Feb 04 '13 18:02

prototype


1 Answers

In most databases, you can do what you want using CTEs (Common Table Expressions):

with CustomerProfile as (
      getCustomerProfile
     ),
     SummarizedCustomerTransactionHistory as (
      getSummarizedCustomerTransactionHistory
     ),
     GeographicalSummaries as (
      getGeographicalSummaries
     )
select <whatever>

This works for a single query. It has the advantage that you can define a CTE once, but use it multiple times. Also, I often define a CTE called const that has constant values.

The next step is to take these constructs and create views from them. This is especially useful when sharing code among multiple modules, to ensure constant definitions. In some databases, you can put indexes on the views to "instantiate" them, further optimizing processing.

Finally, I recommend wrapping inserts/updates/deletes in stored procedures. This allows you to do have a consistent framework.

Two more comments though. First, SQL is often used for transactional or reporting systems. Often, once you get the data in the right format for the purpose, the data speaks for itself. You example might just be asking for a data mart that has three tables devoted to those three subject areas, which get populated once per week or once per day.

And, SQL is not an idea language for abstraction. With good practice, naming conventions, and indentation style, you can make it useful. I sorely miss certain things from "real" languages, such as macros, error handling (why data errors are so hard to identify and handle is beyond me), consistent methods for common functionality (can someone say group string concatenation), and some other features. That said, because it is data centric and readily parallelizable, it is more useful for me than most other languages.

like image 128
Gordon Linoff Avatar answered Sep 25 '22 17:09

Gordon Linoff