Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reliably get dependent variable name from formula object?

Tags:

r

Let's say I have the following formula:

myformula<-formula("depVar ~ Var1 + Var2")

How to reliably get dependent variable name from formula object?

I failed to find any built-in function that serves this purpose. I know that as.character(myformula)[[2]] works, as do

sub("^(\\w*)\\s~\\s.*$","\\1",deparse(myform))

It just seems to me, that these methods are more a hackery, than a reliable and standard method to do it.


Does anyone know perchance what exactly method the e.g. lm use? I've seen it's code, but it is a little to cryptic to me... here is a quote for your convenience:

    > lm
function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- as.name("model.frame")
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
        warning(gettextf("method = '%s' is not supported. Using 'qr'", 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w)) 
        stop("'weights' must be a numeric vector")
    offset <- as.vector(model.offset(mf))
    if (!is.null(offset)) {
        if (length(offset) != NROW(y)) 
            stop(gettextf("number of offsets is %d, should equal %d (number of observations)", 
                length(offset), NROW(y)), domain = NA)
    }
    if (is.empty.model(mt)) {
        x <- NULL
        z <- list(coefficients = if (is.matrix(y)) matrix(, 0, 
            3) else numeric(), residuals = y, fitted.values = 0 * 
            y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w != 
            0) else if (is.matrix(y)) nrow(y) else length(y))
        if (!is.null(offset)) {
            z$fitted.values <- offset
            z$residuals <- y - offset
        }
    }
    else {
        x <- model.matrix(mt, mf, contrasts)
        z <- if (is.null(w)) 
            lm.fit(x, y, offset = offset, singular.ok = singular.ok, 
                ...)
        else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
            ...)
    }
    class(z) <- c(if (is.matrix(y)) "mlm", "lm")
    z$na.action <- attr(mf, "na.action")
    z$offset <- offset
    z$contrasts <- attr(x, "contrasts")
    z$xlevels <- .getXlevels(mt, mf)
    z$call <- cl
    z$terms <- mt
    if (model) 
        z$model <- mf
    if (ret.x) 
        z$x <- x
    if (ret.y) 
        z$y <- y
    if (!qr) 
        z$qr <- NULL
    z
}
like image 740
Adam Ryczkowski Avatar asked Nov 04 '12 09:11

Adam Ryczkowski


People also ask

What is the easiest way to identify the dependent variable?

The easiest way to identify which variable in your experiment is the Independent Variable (IV) and which one is the Dependent Variable (DV) is by putting both the variables in the sentence below in a way that makes sense. “The IV causes a change in the DV. It is not possible that DV could cause any change in IV.”

How do you identify the dependent variable?

The dependent variable is the variable that is being measured or tested in an experiment. 1 For example, in a study looking at how tutoring impacts test scores, the dependent variable would be the participants' test scores since that is what is being measured.

What is the name the dependent variables used?

Dependent variables are also known as outcome variables, left-hand-side variables, or response variables.


3 Answers

Try using all.vars:

all.vars(myformula)[1]
like image 118
seancarmody Avatar answered Sep 26 '22 13:09

seancarmody


I suppose you could also cook your own function to work with terms():

getResponse <- function(formula) {
    tt <- terms(formula)
    vars <- as.character(attr(tt, "variables"))[-1] ## [1] is the list call
    response <- attr(tt, "response") # index of response var
    vars[response] 
}

R> myformula <- formula("depVar ~ Var1 + Var2")
R> getResponse(myformula)
[1] "depVar"

It is just as hacky as as.character(myformyula)[[2]] but you have the assurance that you get the correct variable as the ordering of the call parse tree isn't going to change any time soon.

This isn't so good with multiple dependent variables:

R> myformula <- formula("depVar1 + depVar2 ~ Var1 + Var2")
R> getResponse(myformula)
[1] "depVar1 + depVar2"

as they'll need further processing.

like image 14
Gavin Simpson Avatar answered Sep 23 '22 13:09

Gavin Simpson


I found an useful package 'formula.tools' which is suitable for your task.

code Example:

f <- as.formula(a1 + a2~a3 + a4)

lhs.vars(f) #get dependent variables

[1] "a1" "a2"

rhs.vars(f) #get independent variables

[1] "a3" "a4"

like image 13
Gao Hao Avatar answered Sep 23 '22 13:09

Gao Hao