Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Microsoft cubes usage boundaries and best practicies

So we're thinking about using cubes in our organization.

Situation AS IS:

  • DWH (Azure MS SQL) Query language - SQL
  • Microsoft Column Storage (Not real cubes) Query language DAX (There is MDX support, but looks like it's poorly implemented - inefficient)
  • Tableau (BI system, reports) Can use SQL and MDX

Known problems:

  • When we use MDX there is aggregation problem by date (we should show year, month, date hierarchy in the query), there is no such problem with DAX.
  • Microsoft Column Storage inefficient running total calculating.

How we want to solve the problem right now:

  • Use Microsoft Column Storage, materializing running total but won't use this kind of "cube" in all reports, only for a few people that really need it
  • In DWH materializing running total. All Tableau reports using it
  • In DWH we have data with daily granulation (Ex: We have a record that changed 1st November, 5th November, 15th November, before we have 3 records in DWH now we'll have 15). We need it like this to be able to have up to any date data really fast (basically we're implementing our own cube line this)

Pros:

  • No one will need to go in-depth with DAX and MDX languages
  • We shouldn't refactor anything

Cos:

  • DWH upload(update) will become longer than right now
  • DWH will become bigger (an everyday data for records)
  • We need to maintain running total fields in a manual way

Known alternatives:

  • Microsoft Power BI - can use DAX and MDX really efficient
  • Microsoft Analysis Services Cube (Real cubes) - MDX efficient on this as long as we concern, not like in Microsoft Column Storage

Questions:

  • First: if it's possible I really want to have your impression of technologies that you've used to understand what and why causes pain when you develop and maintain the solution.
  • Second: it will be really appreciated if you'll have any criticism on our current approach - why is that bad?
  • Third: Are cubes dead? I mean google doesn't present its own cubes, maybe the technology of itself is a dead-end?
  • Last: if you have any advice on what we need to use - that will be great.
like image 931
teo van kot Avatar asked Nov 07 '22 12:11

teo van kot


1 Answers

I am trying to answer it step by step based on my experiences, Question is way too large for single technology or person.

First: if it's possible I really want to have your impression of technologies that you've used to understand what and why causes pain when you develop and maintain the solution.

Warehousing, cube, reporting, querying is moving fast on different distributed technology which can scale horizontally on relatively cheap hardware, scale up/down on demand and also can scale quickly. Also size of data is ever increasing with rise in Bandwidths of internet, globalization, social networking and various reasons. Hadoop, Cloud initially fill in gap for distributed tech that can evolve on distributed horizontally & can scale up/down easily.

Having a sql server with high computation & High RAM for in-memory high data, mdx, cube is usually vertical scaling, is costly & can't be scaled down back as easily as distributed horizontally even if we have SQL server on cloud.

Now with advantages comes complexities of developing Bigdata solution, learning curve & maintenance which is again a big challenge for new adopters who are not familiar with it till now.

Second: it will be really appreciated if you'll have any criticism on our current approach - why is that bad

There is no golden bullet or silver lining architecture that can solve every issue you face without facing some issues of it's own. Your approach is again viable & has it's pro's & cons based on your current organisation structure. What I am assuming your team is familiar with SQL server, mdx , cubes & column storage and also done feasibility analysis. Only issue I see is when size of data increases SQL demands more computing power & RAM that can mostly be done by upgrading VM/machine. Vertical Scaling is costly & there is always limit at some time. Also failover/DR on such infra is again more costly.

Third: Are cubes dead? I mean google doesn't present its own cubes, maybe the technology of itself is a dead-end?

No technology is dead if you can find support for it, even assembly, C, C++, Cobol is still going strong for old projects and for cases where it fit better than other.

Last: if you have any advice on what we need to use - that will be great.

Do POC(proof of concepts) for at-least 3-4 types of solutions/architecture, what suits you best with cost/skill/timeframe, you will be the best judge.

I can suggest if you are open to cloud based solution try exploring some other solutions like data lake with azure data factory for Proof of concepts if it can meet your requirements.

Also I came through one out-of-box solution from Microsoft quite recently which is worth looking: Azure Synapse Analytics(https://azure.microsoft.com/en-in/services/synapse-analytics/). It has in built support of dataware housing, querying, support for AI/BI, streaming, data lake exploration, security, scale, support for Spark and various other sources with PowerBI too, insights/visual display.

like image 99
Pranav Singh Avatar answered Nov 12 '22 17:11

Pranav Singh