I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1. I want: <pre class="prettyprint"><code>Y = α + β1 * x1 + β2 * x2 + β3 * x3, where β1 + β2 = 1 </code></pre> I have found how to make a relation between coefficients like: <pre class="prettyprint"><code>β1 = 2* β2 </code></pre> But I haven't found how to make restrictions like: <pre class="prettyprint"><code>β1 = 1 - β2 </code></pre> How would I do it in this simple example? <pre class="prettyprint"><code>data <- data.frame( A = c(1,2,3,4), B = c(3,2,2,3), C = c(3,3,2,3), D = c(5,3,3,4) ) lm(formula = 'D ~ A + B + C', data = data) </code></pre> Thanks!

<h3>β1 + β2 = 1</h3> To have <code>β1 + β2 = 1</code> the model you have to fit is <pre class="prettyprint"><code>fit <- lm(Y ~ offset(x1) + I(x2 - x1) + x3, data = df) </code></pre> That is <blockquote> Y = α + x1 + β2 * (x2 - x1) + β3 * x3 </blockquote> after substituting β1 = 1 - β2; <code>x_new = x2 - x1</code> and the coefficient for <code>x1</code> is 1. <hr> <h3>β1 + β2 + β3 = 1</h3> <pre class="prettyprint"><code>fit <- lm(Y ~ offset(x1) + I(x2 - x1) + I(x3 - x1), data = df) </code></pre> <blockquote> Y = α + x1 + β2 * (x2 - x1) + β3 * (x3 - x1) </blockquote> after substituting <code>β1 = 1 - β2 - β3</code> <hr> <h3>β1 + β2 + β3 + ... = 1</h3> I think the pattern is clear... you just have to subtract one variable, <code>x1</code>, from the remaining variables(<code>x2</code>, <code>x3</code>, <code>...</code>) and have the coefficient of that variable, <code>x1</code>, to 1. <hr> <h3>Example β1 + β2 = 1</h3> <pre class="prettyprint"><code># Data df <- iris[, 1:4] colnames(df) <- c("Y", paste0("x", 1:3, collaapse="")) # β1 + β2 = 1 fit <- lm(Y ~ offset(x1) + I(x2 - x1) + x3, data = df) coef_2 <- coef(fit) beta_1 <- 1 - coef_2[2] beta_2 <- coef_2[2] </code></pre>

1) CVXR We can compute the coefficients using CVXR directly by specifying the objective and constraint. We assume that D is the response, the coefficients of A and B must sum to 1, b[1] is the intercept and b[2], b[3] and b[4] are the coefficients of A, B and C respectively. <pre class="prettyprint"><code>library(CVXR) b <- Variable(4) X <- cbind(1, as.matrix(data[-4])) obj <- Minimize(sum((data$D - X %*% b)^2)) constraints <- list(b[2] + b[3] == 1) problem <- Problem(obj, constraints) soln <- solve(problem) bval <- soln$getValue(b) bval ## [,1] ## [1,] 1.6428605 ## [2,] -0.3571428 ## [3,] 1.3571428 ## [4,] -0.1428588 </code></pre> The objective is the residual sum of squares and it equals: <pre class="prettyprint"><code>soln$value ## [1] 0.07142857 </code></pre> 2) pracma We can also use the pracma package to compute the coefficients. We specify the X matrix, response vector, the constraint matrix (in this case the vector given as the third argument is regarded as a one row matrix) and the right hand side of the constraint. <pre class="prettyprint"><code>library(pracma) lsqlincon(X, data$D, Aeq = c(0, 1, 1, 0), beq = 1) # X is from above ## [1] 1.6428571 -0.3571429 1.3571429 -0.1428571 </code></pre> 3) limSolve This package can also solve for the coefficients of regression problems with constraints. The arguments are the same as in (2). <pre class="prettyprint"><code>library(limSolve) lsei(X, data$D, c(0, 1, 1, 0), 1) </code></pre> giving: <pre class="prettyprint"><code>$X A B C 1.6428571 -0.3571429 1.3571429 -0.1428571 $residualNorm [1] 0 $solutionNorm [1] 0.07142857 $IsError [1] FALSE $type [1] "lsei" </code></pre> <h3>Check</h3> We can double check the above by using the <code>lm</code> approach in the other answer: <pre class="prettyprint"><code>lm(D ~ I(A-B) + C + offset(B), data) </code></pre> giving: <pre class="prettyprint"><code>Call: lm(formula = D ~ I(A - B) + C + offset(B), data = data) Coefficients: (Intercept) I(A - B) C 1.6429 -0.3571 -0.1429 </code></pre> The <code>I(A-B)</code> coefficient equals the coefficient of <code>A</code> in the original formulation and one minus it is the coefficient of <code>C</code>. We see that all approaches do lead to the same coefficients.

R: Force regression coefficients to add up to 1

Q: What does adding a constant do to a regression?

In other words, the model tends to make predictions that are systematically too high or too low. The constant term prevents this overall bias by forcing the residual mean to equal zero. Imagine that you can move the regression line up or down to the point where the residual mean equals zero.

Tags:

r

restriction

linear-regression

regression

I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1.

I want:

Y = α + β1 * x1 + β2 * x2 + β3 * x3,
where β1 + β2 = 1

I have found how to make a relation between coefficients like:

β1 = 2* β2

But I haven't found how to make restrictions like:

β1 = 1 - β2

How would I do it in this simple example?

data <- data.frame(
  A = c(1,2,3,4),
  B = c(3,2,2,3),
  C = c(3,3,2,3),
  D = c(5,3,3,4)
)

lm(formula = 'D ~ A + B + C', data = data)

Thanks!

381

asked Oct 17 '19 22:10

Daniel

2 Answers

β1 + β2 = 1

To have β1 + β2 = 1 the model you have to fit is

fit <- lm(Y ~  offset(x1) + I(x2 - x1) + x3, data = df)

That is

Y = α + x1 + β2 * (x2 - x1) + β3 * x3

after substituting β1 = 1 - β2; x_new = x2 - x1 and the coefficient for x1 is 1.

β1 + β2 + β3 = 1

fit <- lm(Y ~  offset(x1) + I(x2 - x1) + I(x3 - x1), data = df)

Y = α + x1 + β2 * (x2 - x1) + β3 * (x3 - x1)

after substituting β1 = 1 - β2 - β3

β1 + β2 + β3 + ... = 1

I think the pattern is clear... you just have to subtract one variable, x1, from the remaining variables(x2, x3, ...) and have the coefficient of that variable, x1, to 1.

Example β1 + β2 = 1

# Data
df <- iris[, 1:4]
colnames(df) <- c("Y", paste0("x", 1:3, collaapse=""))

# β1 + β2 = 1
fit <- lm(Y ~  offset(x1) + I(x2 - x1) + x3, data = df)
coef_2 <- coef(fit)
beta_1 <- 1 - coef_2[2]
beta_2 <- coef_2[2]

120

answered Oct 19 '22 15:10

Suren

1) CVXR We can compute the coefficients using CVXR directly by specifying the objective and constraint. We assume that D is the response, the coefficients of A and B must sum to 1, b[1] is the intercept and b[2], b[3] and b[4] are the coefficients of A, B and C respectively.

library(CVXR)

b <- Variable(4)
X <- cbind(1, as.matrix(data[-4]))
obj <- Minimize(sum((data$D - X %*% b)^2))
constraints <- list(b[2] + b[3] == 1)
problem <- Problem(obj, constraints)
soln <- solve(problem)

bval <- soln$getValue(b)
bval
##            [,1]
## [1,]  1.6428605
## [2,] -0.3571428
## [3,]  1.3571428
## [4,] -0.1428588

The objective is the residual sum of squares and it equals:

soln$value
## [1] 0.07142857

2) pracma We can also use the pracma package to compute the coefficients. We specify the X matrix, response vector, the constraint matrix (in this case the vector given as the third argument is regarded as a one row matrix) and the right hand side of the constraint.

library(pracma)

lsqlincon(X, data$D, Aeq = c(0, 1, 1, 0), beq = 1) # X is from above
## [1]  1.6428571 -0.3571429  1.3571429 -0.1428571

3) limSolve This package can also solve for the coefficients of regression problems with constraints. The arguments are the same as in (2).

library(limSolve)
lsei(X, data$D, c(0, 1, 1, 0), 1)

giving:

$X
                    A          B          C 
 1.6428571 -0.3571429  1.3571429 -0.1428571 

$residualNorm
[1] 0

$solutionNorm
[1] 0.07142857

$IsError
[1] FALSE

$type
[1] "lsei"

Check

We can double check the above by using the lm approach in the other answer:

lm(D ~ I(A-B) + C + offset(B), data)

giving:

Call:
lm(formula = D ~ I(A - B) + C + offset(B), data = data)

Coefficients:
(Intercept)     I(A - B)            C  
     1.6429      -0.3571      -0.1429

The I(A-B) coefficient equals the coefficient of A in the original formulation and one minus it is the coefficient of C. We see that all approaches do lead to the same coefficients.

answered Oct 19 '22 13:10

G. Grothendieck

Related questions
                            
                                Loading shiny module only when menu items is clicked
                            
                                How can I have the search option based on typing letters in pickerInput using shinyWidgets?
                            
                                How to add a complex label with italics and a variable to ggplot?
                            
                                Stack a named Date list to data.frame
                            
                                Naive Bayes in Quanteda vs caret: wildly different results
                            
                                Mutliple formatted text on pptx by using officer package on R
                            
                                Image processing: Average grayscale images
                            
                                Unable to pass user inputs into R shiny modules
                            
                                R's equivalent of string.replace() in python
                            
                                Shiny widgets in DT Table
                            
                                R Mutate multiple columns with ifelse()-condition
                            
                                Reading numpy ndarrays into R?
                            
                                How to format the input of Shiny updated numericInput but not change the actual value?
                            
                                Extract p-value from checkresiduals function
                            
                                Converting unit abbreviations to numbers
                            
                                Change filename when downloading data from datatable R
                            
                                Using the R cut function - how do the breaks and labels options work
                            
                                Recommended way to subset two vectors with the same index vector
                            
                                Reconvert numeric date to POSIXct R
                            
                                How to get quantiles to work with summarise_at and group_by (dplyr)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With