Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named <code>Returns</code> for the sake of having a concrete example: <pre class="prettyprint"><code> Date Equity Duration Credit Manager ----------------------------------------------- 01/31/2017 2.907% 0.226% 1.240% 1.78% 02/28/2017 2.513% 0.493% 1.120% 3.88% 03/31/2017 1.346% -0.046% -0.250% 0.13% 04/30/2017 1.612% 0.695% 0.620% 1.04% 05/31/2017 2.209% 0.653% 0.480% 1.40% 06/30/2017 0.796% -0.162% 0.350% 0.63% 07/31/2017 2.733% 0.167% 0.830% 2.06% 08/31/2017 0.401% 1.083% -0.670% 0.29% 09/30/2017 1.880% -0.857% 1.430% 2.04% 10/31/2017 2.151% -0.121% 0.510% 2.33% 11/30/2017 2.020% -0.137% -0.020% 3.06% 12/31/2017 1.454% 0.309% 0.230% 1.28% </code></pre> Now in Excel, I can just use the <code>LINEST</code> function to get the beta values: <pre class="prettyprint"><code>= LINEST(Returns[Manager], Returns[[Equity]:[Credit]], TRUE, TRUE) </code></pre> It spits out an array that looks like this: <pre class="prettyprint"><code>0.077250253 -0.184974002 0.961578127 -0.001063971 0.707796954 0.60202895 0.540811546 0.008257129 0.50202386 0.009166729 #N/A #N/A 2.688342242 8 #N/A #N/A 0.000677695 0.000672231 #N/A #N/A </code></pre> The betas are in the top row and using them gives me the following linear estimate: <pre class="prettyprint"><code>Manager = 0.962 * Equity - 0.185 * Duration + 0.077 * Credit - 0.001 </code></pre> The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)? <hr> For simple linear regression against one column, I can go back to the mathematical definition and write a least squares implementation similar to the one given in this post. However, when more columns become involved (I need to be able to do up to 12 columns, but not always the same number), this gets messy really quickly and I'm hoping there's a better way.

As there is no equivalent or handy replacement for <code>LINEST</code> function in Power BI (I'm sure you've done enough research before posting the question), any attempts would mean rewriting the whole function in Power Query / M, which is already not that "simple" for the case of simple linear regression, not to mention multiple variables. Rather than (re)inventing the wheel, it's inevitably much easier (one-liner code..) to do it with R script in Power BI. It's not a bad option given that I have no prior R experience. After a few searches and trial-and-error, I'm able to come up with this: <pre class="prettyprint"><code># 'dataset' holds the input data for this script # install.packages("broom") # uncomment to install if package does not exist library(broom) model <- lm(Manager ~ Equity + Duration + Credit, dataset) model <- tidy(model) </code></pre> <code>lm</code> is the built-in linear model function from R, and the <code>tidy</code> function comes with the <code>broom</code> package, which tidies up the output and output a data frame for Power BI. <img src="https://i.stack.imgur.com/JdCjW.png" alt="result"> With the columns <code>term</code> and <code>estimate</code>, this should be sufficient to calculate the estimate you want. The M Query for your reference: <pre class="prettyprint"><code>let Source = Csv.Document(File.Contents("returns.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]), #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type text}, {"Equity", Percentage.Type}, {"Duration", Percentage.Type}, {"Credit", Percentage.Type}, {"Manager", Percentage.Type}}), #"Run R Script" = R.Execute("# 'dataset' holds the input data for this script#(lf)# install.packages(""broom"")#(lf)library(broom)#(lf)#(lf)model <- lm(Manager ~ Equity + Duration + Credit, dataset)#(lf)model <- tidy(model)",[dataset=#"Changed Type"]), #"""model""" = #"Run R Script"{[Name="model"]}[Value] in #"""model""" </code></pre>

The essence: DAX is not the way to go. Use <code>Home > Edit Queries</code> and then <code>Transform > Run R Script</code>. Insert the following R snippet to run a regression analysis using all available variables in a table: <pre class="prettyprint"><code>model <- lm(Manager ~ . , dataset) df<- data.frame(coef(model)) names(df)[names(df)=="coef.model."] <- "coefficients" df['variables'] <- row.names(df) </code></pre> Edit <code>Manager</code> to any of the other available variable names to change the dependent variable. <hr> The details: Good question! Why Microsoft has not introduced more flexible solutions is beyond my understanding. But at the time being, you won't be able to find very good approaches without using R in Power BI. My suggested approach will therefore ignore your request regarding: <blockquote> The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)? </blockquote> My answer will however meet your requirements regarding: <blockquote> A good answer should generalize to more than 3 columns (probably by working on an unpivoted data table with the indices as values rather than column headers). </blockquote> Here we go: <hr> I'm on a system using comma as a decimal separator, so I'm going to be using the following as the data source (If you copy the numbers directly into Power BI, the column separation will not be maintained. If you first paste it into Excel, copy it again and THEN paste it into Power BI the columns will be fine): <pre class="prettyprint"><code>Date Equity Duration Credit Manager 31.01.2017 2,907 0,226 1,24 1,78 28.02.2017 2,513 0,493 1,12 3,88 31.03.2017 1,346 -0,046 -0,25 0,13 30.04.2017 1,612 0,695 0,62 1,04 31.05.2017 2,209 0,653 0,48 1,4 30.06.2017 0,796 -0,162 0,35 0,63 31.07.2017 2,733 0,167 0,83 2,06 31.08.2017 0,401 1,083 -0,67 0,29 30.09.2017 1,88 -0,857 1,43 2,04 31.10.2017 2,151 -0,121 0,51 2,33 30.11.2017 2,02 -0,137 -0,02 3,06 31.12.2017 1,454 0,309 0,23 1,28 </code></pre> Starting from scratch in Power BI (for reproducibility purposes) I'm inserting the data using <code>Enter Data</code>: <img src="https://i.stack.imgur.com/TDH4X.png" alt="enter image description here"> Now, go to <code>Edit Queries > Edit Queries</code> and check that you have this: <img src="https://i.stack.imgur.com/j2Sdv.png" alt="enter image description here"> In order to maintain flexibility with regards to the number of columns to include in your analysis, I find it is best to remove the Date Column. This will not have an impact on your regression results. Simply right-click the Date column and select <code>Remove</code>: <img src="https://i.stack.imgur.com/ETYp8.png" alt="enter image description here"> Notice that this will add a new step under <code>Query Settings > Applied Steps</code>>: <img src="https://i.stack.imgur.com/QJUlg.png" alt="enter image description here"> And this is where you are going to be able to edit the few lines of R code we're going to use. Now, go to <code>Transform > Run R Script</code> to open this window: <img src="https://i.stack.imgur.com/SFOgv.png" alt="enter image description here"> Notice the line <code># 'dataset' holds the input data for this script</code>. Thankfully, your question is only about ONE input table, so things aren't going to get too complicated (for multiple input tables check out this post). The dataset variable is a variable of the form data.frame in R and is a good (the only..) starting point for further analysis. Insert the following script: <pre class="prettyprint"><code>model <- lm(Manager ~ . , dataset) df<- data.frame(coef(model)) names(df)[names(df)=="coef.model."] <- "coefficients" df['variables'] <- row.names(df) </code></pre> <img src="https://i.stack.imgur.com/erGCv.png" alt="enter image description here"> Click <code>OK</code>, and if all goes well you should end up with this: <img src="https://i.stack.imgur.com/R9be6.png" alt="enter image description here"> Click <code>Table</code>, and you'll get this: <img src="https://i.stack.imgur.com/mZKvF.png" alt="enter image description here"> Under <code>Applied Steps</code> you'll se that a <code>Run R Script</code> step has been inserted. Click the star (gear ?) on the right to edit it, or click on <code>df</code> to format the output table. This is it! For the Edit Queries part at least. Click <code>Home > Close & Apply</code> to get back to Power BI Report section and verfiy that you have a new table under <code>Visualizations > Fields</code>: <img src="https://i.stack.imgur.com/nizVV.png" alt="enter image description here"> Insert a Table or Matrix and activate Coefficients and Variables to get this: <img src="https://i.stack.imgur.com/3p8sm.png" alt="enter image description here"> I hope this is what you were looking for! <hr> Now for some details about the R script: As long as it's possible, I would avoid using numerous different R libraries. This way you'll reduce the risk of dependency issues. The function <code>lm()</code> handles the regression analysis. The key to obtain the required flexibilty with regards to the number of explanatory variables lies in the <code>Manager ~ . , dataset</code> part. This simply says to run a regression analysis on the <code>Manager</code> variable in the dataframe <code>dataset</code>, and use all remaining columns <code>~ .</code> as explanatory variables. The <code>coef(model)</code> part extracts the coefficient values from the estimated model. The result is a dataframe with the variable names as row names. The last line simply adds these names to the dataframe itself.

Multiple Linear Regression in Power BI

Tags:

excel-formula

linear-regression

powerbi

dax

Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns for the sake of having a concrete example:

  Date       Equity  Duration  Credit  Manager
-----------------------------------------------
01/31/2017   2.907%   0.226%   1.240%   1.78%
02/28/2017   2.513%   0.493%   1.120%   3.88%
03/31/2017   1.346%  -0.046%  -0.250%   0.13%
04/30/2017   1.612%   0.695%   0.620%   1.04%
05/31/2017   2.209%   0.653%   0.480%   1.40%
06/30/2017   0.796%  -0.162%   0.350%   0.63%
07/31/2017   2.733%   0.167%   0.830%   2.06%
08/31/2017   0.401%   1.083%  -0.670%   0.29%
09/30/2017   1.880%  -0.857%   1.430%   2.04%
10/31/2017   2.151%  -0.121%   0.510%   2.33%
11/30/2017   2.020%  -0.137%  -0.020%   3.06%
12/31/2017   1.454%   0.309%   0.230%   1.28%

Now in Excel, I can just use the LINEST function to get the beta values:

= LINEST(Returns[Manager], Returns[[Equity]:[Credit]], TRUE, TRUE)

It spits out an array that looks like this:

0.077250253 -0.184974002  0.961578127 -0.001063971
0.707796954  0.60202895   0.540811546  0.008257129
0.50202386   0.009166729  #N/A         #N/A
2.688342242  8            #N/A         #N/A
0.000677695  0.000672231  #N/A         #N/A

The betas are in the top row and using them gives me the following linear estimate:

Manager = 0.962 * Equity - 0.185 * Duration + 0.077 * Credit - 0.001

The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)?

For simple linear regression against one column, I can go back to the mathematical definition and write a least squares implementation similar to the one given in this post.

However, when more columns become involved (I need to be able to do up to 12 columns, but not always the same number), this gets messy really quickly and I'm hoping there's a better way.

344

asked Feb 14 '18 21:02

Alexis Olson

2 Answers

As there is no equivalent or handy replacement for LINEST function in Power BI (I'm sure you've done enough research before posting the question), any attempts would mean rewriting the whole function in Power Query / M, which is already not that "simple" for the case of simple linear regression, not to mention multiple variables.

Rather than (re)inventing the wheel, it's inevitably much easier (one-liner code..) to do it with R script in Power BI.

It's not a bad option given that I have no prior R experience. After a few searches and trial-and-error, I'm able to come up with this:

# 'dataset' holds the input data for this script
# install.packages("broom") # uncomment to install if package does not exist
library(broom)

model <- lm(Manager ~ Equity + Duration + Credit, dataset)
model <- tidy(model)

lm is the built-in linear model function from R, and the tidy function comes with the broom package, which tidies up the output and output a data frame for Power BI.

result

With the columns term and estimate, this should be sufficient to calculate the estimate you want.

The M Query for your reference:

let
    Source = Csv.Document(File.Contents("returns.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type text}, {"Equity", Percentage.Type}, {"Duration", Percentage.Type}, {"Credit", Percentage.Type}, {"Manager", Percentage.Type}}),
    #"Run R Script" = R.Execute("# 'dataset' holds the input data for this script#(lf)# install.packages(""broom"")#(lf)library(broom)#(lf)#(lf)model <- lm(Manager ~ Equity + Duration + Credit, dataset)#(lf)model <- tidy(model)",[dataset=#"Changed Type"]),
    #"""model""" = #"Run R Script"{[Name="model"]}[Value]
in
    #"""model"""

answered Oct 23 '22 22:10

Foxan Ng

The essence:

DAX is not the way to go. Use Home > Edit Queries and then Transform > Run R Script. Insert the following R snippet to run a regression analysis using all available variables in a table:

model <- lm(Manager ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)

Edit Manager to any of the other available variable names to change the dependent variable.

The details:

Good question! Why Microsoft has not introduced more flexible solutions is beyond my understanding. But at the time being, you won't be able to find very good approaches without using R in Power BI.

My suggested approach will therefore ignore your request regarding:

The question is how can I get these values in Power BI using DAX (preferably without having to write a custom R script)?

My answer will however meet your requirements regarding:

A good answer should generalize to more than 3 columns (probably by working on an unpivoted data table with the indices as values rather than column headers).

Here we go:

I'm on a system using comma as a decimal separator, so I'm going to be using the following as the data source (If you copy the numbers directly into Power BI, the column separation will not be maintained. If you first paste it into Excel, copy it again and THEN paste it into Power BI the columns will be fine):

Date    Equity  Duration    Credit  Manager
31.01.2017  2,907   0,226   1,24    1,78
28.02.2017  2,513   0,493   1,12    3,88
31.03.2017  1,346   -0,046  -0,25   0,13
30.04.2017  1,612   0,695   0,62    1,04
31.05.2017  2,209   0,653   0,48    1,4
30.06.2017  0,796   -0,162  0,35    0,63
31.07.2017  2,733   0,167   0,83    2,06
31.08.2017  0,401   1,083   -0,67   0,29
30.09.2017  1,88    -0,857  1,43    2,04
31.10.2017  2,151   -0,121  0,51    2,33
30.11.2017  2,02    -0,137  -0,02   3,06
31.12.2017  1,454   0,309   0,23    1,28

Starting from scratch in Power BI (for reproducibility purposes) I'm inserting the data using Enter Data:

enter image description here

Now, go to Edit Queries > Edit Queries and check that you have this:

enter image description here

In order to maintain flexibility with regards to the number of columns to include in your analysis, I find it is best to remove the Date Column. This will not have an impact on your regression results. Simply right-click the Date column and select Remove:

enter image description here

Notice that this will add a new step under Query Settings > Applied Steps>:

enter image description here

And this is where you are going to be able to edit the few lines of R code we're going to use. Now, go to Transform > Run R Script to open this window:

enter image description here

Notice the line # 'dataset' holds the input data for this script. Thankfully, your question is only about ONE input table, so things aren't going to get too complicated (for multiple input tables check out this post). The dataset variable is a variable of the form data.frame in R and is a good (the only..) starting point for further analysis.

Insert the following script:

model <- lm(Manager ~ . , dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)

enter image description here

Click OK, and if all goes well you should end up with this:

enter image description here

Click Table, and you'll get this:

enter image description here

Under Applied Steps you'll se that a Run R Script step has been inserted. Click the star (gear ?) on the right to edit it, or click on df to format the output table.

This is it! For the Edit Queries part at least.

Click Home > Close & Apply to get back to Power BI Report section and verfiy that you have a new table under Visualizations > Fields:

enter image description here

Insert a Table or Matrix and activate Coefficients and Variables to get this:

enter image description here

I hope this is what you were looking for!

Now for some details about the R script:

As long as it's possible, I would avoid using numerous different R libraries. This way you'll reduce the risk of dependency issues.

The function lm() handles the regression analysis. The key to obtain the required flexibilty with regards to the number of explanatory variables lies in the Manager ~ . , dataset part. This simply says to run a regression analysis on the Manager variable in the dataframe dataset, and use all remaining columns ~ . as explanatory variables. The coef(model) part extracts the coefficient values from the estimated model. The result is a dataframe with the variable names as row names. The last line simply adds these names to the dataframe itself.

answered Oct 23 '22 20:10

vestland

Related questions
                            
                                ROW() function behaves differently inside SUM() and SUMPRODUCT()
                            
                                Generate a truth table in excel
                            
                                Referencing Dynamic Named Range in Excel Formula
                            
                                Showing percentages above bars on Excel column graph
                            
                                determine the cell holding the longest string with excel
                            
                                Compute a static random number (compute it once) in Excel
                            
                                Excel SUMIF when another cell contains text
                            
                                How do I use a nested IF(AND) in an Excel array formula?
                            
                                IFERROR, INDEX, MATCH returning zeros instead of blanks
                            
                                Extract numbers from chemical formula
                            
                                is a CSV with equal sign valid?
                            
                                How to SUM parts of a column which have same text value in different column in the same row
                            
                                Excel formula to auto-increment after X amount of rows
                            
                                Why does the .Find() method in Excel VBA appear to only check the first 15 characters?
                            
                                Adding formula to Excel with C# - making the formula shown
                            
                                Regular expressions in excel formulas [duplicate]
                            
                                What is the character code for new line break in excel
                            
                                Return the row number if value is between two values in excel
                            
                                Excel formula-based function for SHA256 / SHA512 hashing without VBA or macros
                            
                                Excel Sum If Year Equals

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple Linear Regression in Power BI

Tags:

excel-formula

linear-regression

powerbi

dax

Alexis Olson

People also ask

2 Answers

Foxan Ng

vestland

Recent Activity

Donate For Us