Microsoft cubes usage boundaries and best practicies

Question

So we're thinking about using cubes in our organization.

Situation AS IS:

DWH (Azure MS SQL) Query language - SQL
Microsoft Column Storage (Not real cubes) Query language DAX (There is MDX support, but looks like it's poorly implemented - inefficient)
Tableau (BI system, reports) Can use SQL and MDX

Known problems:

When we use MDX there is aggregation problem by date (we should show year, month, date hierarchy in the query), there is no such problem with DAX.
Microsoft Column Storage inefficient running total calculating.

How we want to solve the problem right now:

Use Microsoft Column Storage, materializing running total but won't use this kind of "cube" in all reports, only for a few people that really need it
In DWH materializing running total. All Tableau reports using it
In DWH we have data with daily granulation (Ex: We have a record that changed 1st November, 5th November, 15th November, before we have 3 records in DWH now we'll have 15). We need it like this to be able to have up to any date data really fast (basically we're implementing our own cube line this)

Pros:

No one will need to go in-depth with DAX and MDX languages
We shouldn't refactor anything

Cos:

DWH upload(update) will become longer than right now
DWH will become bigger (an everyday data for records)
We need to maintain running total fields in a manual way

Known alternatives:

Microsoft Power BI - can use DAX and MDX really efficient
Microsoft Analysis Services Cube (Real cubes) - MDX efficient on this as long as we concern, not like in Microsoft Column Storage

Questions:

First: if it's possible I really want to have your impression of technologies that you've used to understand what and why causes pain when you develop and maintain the solution.
Second: it will be really appreciated if you'll have any criticism on our current approach - why is that bad?
Third: Are cubes dead? I mean google doesn't present its own cubes, maybe the technology of itself is a dead-end?
Last: if you have any advice on what we need to use - that will be great.

Pranav Singh · Accepted Answer

I am trying to answer it step by step based on my experiences, Question is way too large for single technology or person.

First: if it's possible I really want to have your impression of technologies that you've used to understand what and why causes pain when you develop and maintain the solution.

Warehousing, cube, reporting, querying is moving fast on different distributed technology which can scale horizontally on relatively cheap hardware, scale up/down on demand and also can scale quickly. Also size of data is ever increasing with rise in Bandwidths of internet, globalization, social networking and various reasons. Hadoop, Cloud initially fill in gap for distributed tech that can evolve on distributed horizontally & can scale up/down easily.

Having a sql server with high computation & High RAM for in-memory high data, mdx, cube is usually vertical scaling, is costly & can't be scaled down back as easily as distributed horizontally even if we have SQL server on cloud.

Now with advantages comes complexities of developing Bigdata solution, learning curve & maintenance which is again a big challenge for new adopters who are not familiar with it till now.

Second: it will be really appreciated if you'll have any criticism on our current approach - why is that bad

There is no golden bullet or silver lining architecture that can solve every issue you face without facing some issues of it's own. Your approach is again viable & has it's pro's & cons based on your current organisation structure. What I am assuming your team is familiar with SQL server, mdx , cubes & column storage and also done feasibility analysis. Only issue I see is when size of data increases SQL demands more computing power & RAM that can mostly be done by upgrading VM/machine. Vertical Scaling is costly & there is always limit at some time. Also failover/DR on such infra is again more costly.

Third: Are cubes dead? I mean google doesn't present its own cubes, maybe the technology of itself is a dead-end?

No technology is dead if you can find support for it, even assembly, C, C++, Cobol is still going strong for old projects and for cases where it fit better than other.

Last: if you have any advice on what we need to use - that will be great.

Do POC(proof of concepts) for at-least 3-4 types of solutions/architecture, what suits you best with cost/skill/timeframe, you will be the best judge.

I can suggest if you are open to cloud based solution try exploring some other solutions like data lake with azure data factory for Proof of concepts if it can meet your requirements.

Also I came through one out-of-box solution from Microsoft quite recently which is worth looking: Azure Synapse Analytics(https://azure.microsoft.com/en-in/services/synapse-analytics/). It has in built support of dataware housing, querying, support for AI/BI, streaming, data lake exploration, security, scale, support for Spark and various other sources with PowerBI too, insights/visual display.

Microsoft cubes usage boundaries and best practicies

Tags:

architecture

sql-server

database-design

data-warehouse

cube

teo van kot

1 Answers

Pranav Singh

Recent Activity

Donate For Us

Microsoft cubes usage boundaries and best practicies

Tags:

architecture

sql-server

database-design

data-warehouse

cube

teo van kot

1 Answers

Pranav Singh

Related questions

Recent Activity

Donate For Us