I have a database containing tables with more than 600 million records and a set of stored procedures that make complex search operations on the database. The performance of the stored procedures is so slow even with suitable indexes on the tables. The design of the database is a normal relational db design. I want to change the database design to be multidimensional and use the MDX queries instead of the traditional T-SQL queries but the question is: Is the MDX query better than the traditional T-SQL query with regard to performance? and if yes, to what extent will that improve the performance of the queries?
Thanks for any help.
MDX is a query language designed for OLAP databases, as SQL is a query language for relational databases. MDX is essentially an extension to SQL used for queries and script access to multidimensional data. MDX queries access data stored in a SQL Server Analysis Server cube by bringing back facts related to dimensions.
Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulas.
It must be fast and perform to the needs of all types of workloads accessing data. SQL Server 2017 sets the standard when it comes to speed and performance. Based on the incredible work of SQL Server 2016 (See the blog series It Just Runs Faster), SQL Server 2017 is fast: built-in, simple, and online.
Apples and oranges: An analysis services OLAP cube is a fundamentally different type of storage than a SQL Server database, and they are designed to do different things. Technically MDX is not "faster" than T-SQL, or vice versa -- they are just languages, but designed for different needs.
Having said that, a cube is usually what works best for doing numeric analysis of static data, such as aggregating large numbers of sales/transactions/whatever records over time. In contrast, a traditional relational database generally works just fine, if the schema and indexes are well constructed, for search. A simple way to judge: if your SQL queries have to do a lot of
select grock, sum/min/max/avg( foo )
from bar
group by grock -- Ideal Analysis Services problem
then a cube may help (it's designed for aggregate math functions - sum() and group by). OTOH if your queries do a lot of
select cols
from foo
where <complicated search> -- Not so much
then a cube probably will not help, and I would focus instead on tuning the schema, the queries and indexing, and perhaps table partitioning if the data can be suitably partitioned.
Do you have a clustered index and covering non-clustered indexes that match the queries?
MS SSAS OLAP cube can be used in several storage modes:
Relational (OLAP) - the data and metadata stays in your DB and few more materialized views are added. May or may not be faster.
Hybrid (HOLAP) - metadata and (pre-calculated) aggregations are stored on a new server running a SSAS instance. This should speed-up all the queries using aggregations, like "total employee hours for last year by month", but queries which drill-through to specific records may be as before.
Multi-dimensional OLAP (MOLAP) where all your data plus metadata and aggregations are copied to the SSAS server. This is usually the fastest, but duplicates storage.
Before starting this, you should consider optimizing you table layout for reporting and analytics, in other words use a data warehouse (DW) -- put your data in a Kimball star dimension and fact tables. Then you load the DW using ETL(SSIS) periodically and point your reporting and analytics to the DW. It may be that you do not need to use SSAS at all -- SQL queries running against a star table layouts are usually considerably faster than against a normalized DB --operational database.
If this is still too slow, build SSAS cubes on top of the DW.
Once you start loading your DW, you may be able to remove records form your operational database, making it faster for every-day use.
To summarize, my rule-of thumb would be:
1. Build a DW and set your ETL process
2. Try T-SQL reports against the DW, it may be good enough.
3. If still slow, build SSAS cubes (on top of the DW) in HOLAP mode and use MDX to query them.
"The performance of the stored procedures is so slow even with suitable indexes"
I'd be surprised if the stored procedure is the real problem, maybe the the way the procedures are used is slow, but a stored procedure by definition doesn't make it slow. Have you found out what about your procedures is slow? Have your profiled them? I would take a deep long look at that route before redesigning my database. Multi-dimensional databases are for OLAP is your database strictly an OLAP database or is it a hybrid of OLAP and OLTP? Maybe you need to de-normalized and replicate data in your OLTP design into the de-normalize d structure? 600 million records in a table is not by any means huge, it's not small but that doesn't lead me to believe that dropping stored procedures will magically make things fast. Profile your stored procs and see where the performance bottlenecks are before jumping into a bigger project to fix the issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With