Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Managing very large SQL queries

I'm looking for some ideas managing very large SQL queries in Oracle.

My employer is looking to build very wide reports ( 150 - 200 ) columns of data per report. Each item is a sub-query or an element from a view. The data has to be real time, so DW style batch processing is not an option. We also don't use any BI tools , just a java app that generates Excel ( its a requirement to output data in Excel)

The query also contains unions as feeds from other systems. The queries result in very large SQL ( about 1500 lines) that is very difficult to manage.

What strategies can I employ to make the work more manageable?

It is also not a performance problem. I was able to optimize the query to be very efficient , its mostly width of the query , managing 200 columns is a challenge in itself.

like image 232
JavaHead Avatar asked Aug 07 '14 15:08

JavaHead


People also ask

How do you handle a large amount of data in SQL?

The most recommended and best option is to have a STANDBY server, restore the backup of the production database on that server, and then run the DBCC command. If the consistency checks run ok on the standby database, the production database should be ok as it is the source of the standby.

Is there a limit to SQL query size?

65,534 is the maximum number of characters that can be entered in a SQL Query, for a Command Object, in Crystal Reports. Note: If your SQL Query is larger than 64KB, create a stored procedure in your database, then create a report based on the Stored Procedure.

What is considered a large SQL query?

But here's one way to define "large": a "large" table is one that exceeds the amount of real memory the host can allocate to SQL Server.


2 Answers

I deal with queries this length daily and here is some of what helps me out in manitaining them:

First alias every single one of the those columns. When you are building it you may know where each one came from but when it is time to make a change, it is really helpful to know exactly where each column came from. This applies to join conditions, group by and where conditions as well as the select columns.

Organize in easily understandable and testable chunks. I use temp tables to pull things that make sense together and so I can see the results before the final query while in test mode.

This brings me to test mode. If I have chunks of data, I design the proc with a test mode and then query individual temp tables when in test mode, so I can see where the data went wrong if there is a bug. Not sure how Oracle works but in SQL Server, I make this the last parameter and give it a default value, so that it doesn't need to be passed in by the application.

Consider logging the execution details and the values of passed in parameters and certainly log any error messages. This will help tremendously when you have to troubleshoot why this report that has functioned perfectly for six years doesn't work for this one user.

Put columns on a separate line for each one and do the same for where clauses. At times you may have to troublshoot by commenting out joins until you find the one that is causing the problem. It is easier if you can easily comment out the associated fields as well.

If you don't have a technical design document, then at least use comments to explain your thought process. You want to understand the whys not the hows in any comments. This stuff is hard to come back to later and understand even when you wrote it. Give your future self some help.

In developing from scratch, I put the select list in and then comment all but the first item. Then I build the query only until I get that value - testing until I am sure what I got was correct. Then I add the next one and whatever joins or where conditions I might need to get it. Test again making sure it is right. (Oops why did that go from 1000 records to 20000 when I added that? Hmm maybe there is something I need to handle there or is that right?) By adding only one thing at a time, you will find an error in the logic much faster and be much more confident of your results. It will also take you less time than trying to build a massive query in one go.

Finally, there is no substitute for understanding your data. There are plently of complex queries that work but do not give the correct answer. Know if you need an inner join or a left join. Know what where conditions you need to get the records you want. Know how to handle the records when you have a one-to-many relationship (this may require push back on the requirements); should you have 3 lines (one for each child record), or should you put that data in a comma delimited list or should you pick only one of the many records and have one line using aggregation. If the latter, what is the criteria for choosing the record you want to keep?

like image 153
HLGEM Avatar answered Oct 04 '22 22:10

HLGEM


Without seeing the specifics of your problem, here are a couple of ideas that immediately come to mind:

  • If you are looking purely for management, I might suggest organizing your subqueries as a number of views and then referencing those views in your final query.

  • For performance on the other hand you may want to consider creating temp tables or even materialized views (which are fixed views) to break up the heavier parts of your process.

  • If your queries require an enormous amount of subquerying in order to gain usable data, you might need to rethink your database design and possibly create a number of datamarts to easily access reporting data. Think of these as mini-warehouses sans the multi-year trended data.

  • Finally, I know you said you don't use any BI tools but this problem certainly seems like one that might make sense by organizing your data into "cubes" or Business Object "universes". It might be worthwhile to at least entertain the cost of bringing on a BI tool vs. the programming hours to support the current setup.

like image 20
DanK Avatar answered Oct 04 '22 22:10

DanK