I'm looking for some ideas managing very large SQL queries in Oracle. My employer is looking to build very wide reports ( 150 - 200 ) columns of data per report. Each item is a sub-query or an element from a view. The data has to be real time, so DW style batch processing is not an option. We also don't use any BI tools , just a java app that generates Excel ( its a requirement to output data in Excel) The query also contains unions as feeds from other systems. The queries result in very large SQL ( about 1500 lines) that is very difficult to manage. What strategies can I employ to make the work more manageable? It is also not a performance problem. I was able to optimize the query to be very efficient , its mostly width of the query , managing 200 columns is a challenge in itself.

Without seeing the specifics of your problem, here are a couple of ideas that immediately come to mind: <ul> <li>If you are looking purely for management, I might suggest organizing your subqueries as a number of views and then referencing those views in your final query.</li> <li>For performance on the other hand you may want to consider creating temp tables or even materialized views (which are fixed views) to break up the heavier parts of your process.</li> <li>If your queries require an enormous amount of subquerying in order to gain usable data, you might need to rethink your database design and possibly create a number of datamarts to easily access reporting data. Think of these as mini-warehouses sans the multi-year trended data.</li> <li>Finally, I know you said you don't use any BI tools but this problem certainly seems like one that might make sense by organizing your data into "cubes" or Business Object "universes". It might be worthwhile to at least entertain the cost of bringing on a BI tool vs. the programming hours to support the current setup.</li> </ul>

Managing very large SQL queries

Tags:

sql

oracle

data-warehouse

I'm looking for some ideas managing very large SQL queries in Oracle.

My employer is looking to build very wide reports ( 150 - 200 ) columns of data per report. Each item is a sub-query or an element from a view. The data has to be real time, so DW style batch processing is not an option. We also don't use any BI tools , just a java app that generates Excel ( its a requirement to output data in Excel)

The query also contains unions as feeds from other systems. The queries result in very large SQL ( about 1500 lines) that is very difficult to manage.

What strategies can I employ to make the work more manageable?

It is also not a performance problem. I was able to optimize the query to be very efficient , its mostly width of the query , managing 200 columns is a challenge in itself.

232

asked Aug 07 '14 15:08

JavaHead

2 Answers

I deal with queries this length daily and here is some of what helps me out in manitaining them:

First alias every single one of the those columns. When you are building it you may know where each one came from but when it is time to make a change, it is really helpful to know exactly where each column came from. This applies to join conditions, group by and where conditions as well as the select columns.

Organize in easily understandable and testable chunks. I use temp tables to pull things that make sense together and so I can see the results before the final query while in test mode.

This brings me to test mode. If I have chunks of data, I design the proc with a test mode and then query individual temp tables when in test mode, so I can see where the data went wrong if there is a bug. Not sure how Oracle works but in SQL Server, I make this the last parameter and give it a default value, so that it doesn't need to be passed in by the application.

Consider logging the execution details and the values of passed in parameters and certainly log any error messages. This will help tremendously when you have to troubleshoot why this report that has functioned perfectly for six years doesn't work for this one user.

Put columns on a separate line for each one and do the same for where clauses. At times you may have to troublshoot by commenting out joins until you find the one that is causing the problem. It is easier if you can easily comment out the associated fields as well.

If you don't have a technical design document, then at least use comments to explain your thought process. You want to understand the whys not the hows in any comments. This stuff is hard to come back to later and understand even when you wrote it. Give your future self some help.

In developing from scratch, I put the select list in and then comment all but the first item. Then I build the query only until I get that value - testing until I am sure what I got was correct. Then I add the next one and whatever joins or where conditions I might need to get it. Test again making sure it is right. (Oops why did that go from 1000 records to 20000 when I added that? Hmm maybe there is something I need to handle there or is that right?) By adding only one thing at a time, you will find an error in the logic much faster and be much more confident of your results. It will also take you less time than trying to build a massive query in one go.

Finally, there is no substitute for understanding your data. There are plently of complex queries that work but do not give the correct answer. Know if you need an inner join or a left join. Know what where conditions you need to get the records you want. Know how to handle the records when you have a one-to-many relationship (this may require push back on the requirements); should you have 3 lines (one for each child record), or should you put that data in a comma delimited list or should you pick only one of the many records and have one line using aggregation. If the latter, what is the criteria for choosing the record you want to keep?

153

answered Oct 04 '22 22:10

HLGEM

Without seeing the specifics of your problem, here are a couple of ideas that immediately come to mind:

If you are looking purely for management, I might suggest organizing your subqueries as a number of views and then referencing those views in your final query.
For performance on the other hand you may want to consider creating temp tables or even materialized views (which are fixed views) to break up the heavier parts of your process.
If your queries require an enormous amount of subquerying in order to gain usable data, you might need to rethink your database design and possibly create a number of datamarts to easily access reporting data. Think of these as mini-warehouses sans the multi-year trended data.
Finally, I know you said you don't use any BI tools but this problem certainly seems like one that might make sense by organizing your data into "cubes" or Business Object "universes". It might be worthwhile to at least entertain the cost of bringing on a BI tool vs. the programming hours to support the current setup.

answered Oct 04 '22 22:10

DanK

Related questions
                            
                                Querying dates: is "dateval LIKE '2014-01-01%'" a best practice? [duplicate]
                            
                                Complex Query with SQL COUNT and SUM
                            
                                Java-Sqlite Truncate all Database tables
                            
                                Argument data type text is invalid for argument 1 of lower function
                            
                                PostgreSQL query gets magnitudes slower when sorting by columns from two different tables
                            
                                Is database replication the way to go to keep production and development databases in sync?
                            
                                How to speed up table-retrieval with MATLAB and JDBC?
                            
                                How to set two local variables with the same value in sql server?
                            
                                Oracle SQL Developer String variable binding
                            
                                Order by highest average position across multiple columns
                            
                                SQL IN clause with AND logic in it
                            
                                Django Aggregation, sum of counts
                            
                                JDBC Select batching/fetch-size with MySQL
                            
                                Execute sql prepared statement with slice
                            
                                Best way of storing GPS coordinates in database
                            
                                MySQL: Alternate solution of SQL Server's HierarchyId datatype
                            
                                What's the right way to calculate table size in postgres?
                            
                                Python (pandas): store a data frame in hdf5 with a multi index
                            
                                Postgresql, add (years, months or days) to date based on another column
                            
                                Select columns, but casting all columns of given type

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With