When developing stored procedure, the code is typically far from clean. It is often impractical to break down the stored procedure code into single responsibility (CTE's, long select lists, breaking down adds even more functions/procedures and complexity). PL/SQL provide packages, but there is not an equivalent in SQL Server and I minimal experience with those to weigh in on how well they help with the Clean Code problem.
One of my projects relies heavily on stored procedure, with a great many of these having more than 500 lines of code.
After looking at a presentation on Clean Code, I was wondering if there is any advice on applying those concepts to the database.
Succinctly, what are some means (tools, methods) used to make database code more readable and maintainable given its tendency to demand large masses of code and when the project is vested in a database heavy model?
Personally I would not worry about Clean Code in database terms. What you need to worry about is data integrity and performance and security.
I believe as you get more familiar with the database model, it will become easier to understand those long procs. Some things I do though to help troubleshoot later are: If the proc has dynamic sql always write it with the debug variable mode that you can use to grab the SQL code that is built. This will save hours of troubleshooting time.
If the proc does a lot of work make sure to put it in transactions with Try Catch blocks so that everything gets rolled back if there is a problem. You can also take advantage of the fact that rollbacks don't affect table variables and store debugging and error info you will need in an exception table after the rollback. That will make it much easier on prod to see what happens when it fails.
Try to be consistent in your spacing and structure. There are various tools you can use to format. And consistent aliasing helps too. Alias every field referenced because 6 months later you may not remember which one it came from easily. Personally I find it much easier to understand long procs if all the inner joins are before the left joins. This lets me easily see what tables must have data and if the results are wrong it is often one of these that is to blame because it should be a left join.
Be consistent in your parameter data types. Make them match the database datatype of the field you will reference with that parameter.
Store everything in source control and ruthlessly delete all commented out code. If a parameter is not being used, get rid of it. If something is hardcoded, you may want to convert it to a parameter particularly if it is being used more than once.
If you have some business logic which might not be easy to figure out later, put in a comment explaining it.
Try to build new procs in sections, so that you can easily test the results as you go. So for instance suppose you are doing a financial report of all returned orders. You might first write a cte or temp table or table variable to get all the returns. Then you might join to that to get the financial records for just those returns and aggregate them. Then finally you would join all that to other details that you need in the report. So you can check by running just the first part and seeing that there are 2098 returns in the period. After you have aggregated the financial information, you should have one record for each of those 2098 returns. Then at then end you can see of you still have all the needed records. This helps to make sure you didn't lose any along the way by joining unless the data is such that you should have and you understand why you did. You can also see if you are getting extra records from joins to one to many tables which may or may not be appropriate (which is why understanding the data model is crucial).
If you are going to write complex sql, you need to read some books on performance tuning (learning which constructs like correlated subqueries and cursors and what to use instead) to avoid using and how to make sure your where clauses are sargable. There are several books available for SQL Server performance tuning. You also need to read the book on SQL Antipatterns which will be of much more help than the clean code books: http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?ie=UTF8&qid=1406570810&sr=8-1&keywords=sql+antipatterns
In my opinion you will have to find a balance between clean/readable and performance code. A lot of other things also matters. Writing code in individually is not the same as working in teams therefor find a naming and code format rules that applies to all of the developers.
Taking these things into actions below helped a lot.
Use clear variable names - so "too" much documentation is avoided. Fx
DECLARE @N VARCHAR(10) = 'John' This variable does not directly tell you what it is used for.
However this below is much more self explained.
DECLARE @Name VARCHAR(10) = 'JOHN'
--One word:
DECLARE @Machine VARCHAR(10)
--two Words with uppercase(O and S).
DECLARE @OperatingSystem VARCHAR(10)
Divide your code into functions(mostly table-inline functions because of performance) to reduce code.
There is a T-SQL format tool (Poor Man's SQL) that is a life saver. It is a tool that helps clean your code by alligning it etc.
Performance is always the first priority after security when you are working with large datasets.
I hope this helps you and others.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With