Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing Summarize in DAX

I have this DAX formula that gives me a count of id that appear on the fact table in a month, averaged over the year. I can put this measure is a table ad it's unpacked by row with no issues (by adding variables from dimensions)

Measure:= AVERAGEX(
    SUMMARIZE(
        CALCULATETABLE(fact_table;FILTER('Time_Dimension';'Time_Dimension'[Last_month] <> "LAST"));
         Time_Dimension[Month Name];
        "Count";DISTINCTCOUNT(fact_table[ID])
    );
    [Count]
)

But it's terrible slow (I have 3 measures like this on a single table) and the fact table is big (like 300Million rows big)

I was reading that SUMMARIZE perform really bad with aggregations and It should be replaced with SUMMARIZECOLUMNS. I wrote this formula

Measure_v2:= AVERAGEX(
    SUMMARIZECOLUMNS(
        Time_Dimension[Month Name];
        FILTER(Time_Dimension;
            Time_Dimension[Month Name]<>"LAST"
        );
        "Count";DISTINCTCOUNT(fact_table[ID])
    )
    [Count]
)

And it works when I visualize the measure as it is, but when I try to put it in a context (like the table above) it gives me the error "Can't use SUMMARIZECOLUMN and ADDMISSINGITEMS() in this context" How can I make a sustainable optimization from the original SUMMARIZE function?

like image 362
Crysis85 Avatar asked Jun 22 '26 12:06

Crysis85


1 Answers

Before optimizing SUMMARIZE, I would re-visit the overall approach. If your goal is to calculate average fact count per year-month, there is a simpler (and faster) way.

[ID Count]:=CALCULATE(COUNT('fact_table'[ID]),'Time_Dimension'[Last_month] <> "LAST") 

[Average ID Count]:=AVERAGEX( VALUES('Time_Dimension'[Year_Month]), [ID Count`])

assuming that:

  • you have year-month attribute in your time dimension;
  • IDs in your fact table are unique (and therefore, simple count is enough)

If this solution does not solve your problem, then please post your data model - it's hard to optimize without knowing the data structure.

On a side note, I would remove ID field from the fact table. It adds no value to the model, and consumes huge amounts of memory. Your objective can be achieved by simply counting rows:

[Fact Count]:=CALCULATE(COUNTROWS('fact_table'),'Time_Dimension'[Last_month] <> "LAST") 
like image 198
RADO Avatar answered Jun 28 '26 13:06

RADO



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!