Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional assignment of one variable to the value of one of two other variables

Tags:

r

I want to create a new variable that is equal to the value of one of two other variables, conditional on the values of still other variables. Here's a toy example with fake data.

Each row of the data frame represents a student. Each student can be studying up to two subjects (subj1 and subj2), and can be pursuing a degree ("BA") or a minor ("MN") in each subject. My real data includes thousands of students, several types of degree, about 50 subjects, and students can have up to five majors/minors.

   ID  subj1 degree1  subj2 degree2 1   1    BUS      BA   <NA>    <NA> 2   2    SCI      BA    ENG      BA 3   3    BUS      MN    ENG      BA 4   4    SCI      MN    BUS      BA 5   5    ENG      BA    BUS      MN 6   6    SCI      MN   <NA>    <NA> 7   7    ENG      MN    SCI      BA 8   8    BUS      BA    ENG      MN ... 

Now I want to create a sixth variable, df$major, that equals the value of subj1 if subj1 is the student's primary major, or the value of subj2 if subj2 is the primary major. The primary major is the first subject with degree equal to "BA". I tried the following code:

df$major[df$degree1 == "BA"] = df$subj1 df$major[df$degree1 != "BA" & df$degree2 == "BA"] = df$subj2 

Unfortunately, I got an error message:

> df$major[df$degree1 == "BA"] = df$subj1 Error in df$major[df$degree1 == "BA"] = df$subj1 :    NAs are not allowed in subscripted assignments 

I assume this means that a vectorized assignment can't be used if the assignment evaluates to NA for at least one row.

I feel like I must be missing something basic here, but the code above seemed like the obvious thing to do and I haven't been able to come up with an alternative.

In case it would be helpful in writing an answer, here's sample data, created using dput(), in the same format as the fake data listed above:

structure(list(ID = 1:20, subj1 = structure(c(3L, NA, 1L, 2L,  2L, 3L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 3L, 3L, 1L, 2L, 1L ), .Label = c("BUS", "ENG", "SCI"), class = "factor"), degree1 = structure(c(2L,  NA, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  1L, 1L, 1L), .Label = c("BA", "MN"), class = "factor"), subj2 = structure(c(1L,  2L, NA, NA, 1L, NA, 3L, 2L, NA, 2L, 2L, 1L, 3L, NA, 2L, 1L, 1L,  NA, 2L, 2L), .Label = c("BUS", "ENG", "SCI"), class = "factor"),      degree2 = structure(c(2L, 2L, NA, NA, 2L, NA, 1L, 2L, NA,      2L, 1L, 1L, 2L, NA, 1L, 2L, 2L, NA, 1L, 2L), .Label = c("BA",      "MN"), class = "factor")), .Names = c("ID", "subj1", "degree1",  "subj2", "degree2"), row.names = c(NA, -20L), class = "data.frame") 
like image 949
eipi10 Avatar asked May 07 '12 21:05

eipi10


People also ask

How do you assign a value from one variable to another?

After a value is assigned to a variable using the assignment operator, you can assign the value of that variable to another variable using the assignment operator. var myVar; myVar = 5; var myNum; myNum = myVar; The above declares a myVar variable with no value, then assigns it the value 5 .

What is it called when you assign a value to a variable?

The first time a variable is assigned a value, it is said to be initialised. The = symbol is known as the assignment operator. It is also possible to declare a variable and assign it a value in the same line, so instead of int i and then i = 9 you can write int i = 9 all in one go.

How do you assign a value from one variable to another variable in Java?

type variableName = value; Where type is one of Java's types (such as int or String ), and variableName is the name of the variable (such as x or name). The equal sign is used to assign values to the variable.

What are the two ways of giving values to the variable?

There are two ways to assign a value to a variable:in two lines. or in one line.


2 Answers

Your original method of assignment is failing for at least two reasons.

1) A problem with the subscripted assignment df$major[df$degree1 == "BA"] <-. Using == can produce NA, which is what prompted the error. From ?"[<-": "When replacing (that is using indexing on the lhs of an assignment) NA does not select any element to be replaced. As there is ambiguity as to whether an element of the rhs should be used or not, this is only allowed if the rhs value is of length one (so the two interpretations would have the same outcome)." There are many ways to get around this, but I prefer using which:

df$major[which(df$degree1 == "BA")] <- 

The difference is that == returns TRUE, FALSE and NA, while which returns the indices of an object that are TRUE

> df$degree1 == "BA"  [1] FALSE    NA  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  > which(df$degree1 == "BA")  [1]  3  4  5  8  9 10 11 12 13 14 15 16 17 18 19 20 

2) When you perform a subscripted assignment, the right hand side needs to fit into the left hand side sensibly (this is the way I think of it). This can mean left and right hand sides of equal length, which is what your example seems to imply. Therefore, you would need to subset the right hand side of the assignment as well:

df$major[which(df$degree1 == "BA")] <- df$subj1[which(df$degree1 == "BA")] 

I hope that clarifies why your original attempt produced an error.

Using ifelse, as suggested by @DavidRobinson, is a good way of doing this type of assignment. My take on it:

df$major2 <- ifelse(df$degree1 == "BA", df$subj1, ifelse(df$degree2 == "BA",   df$subj2,NA)) 

This is equivalent to

df$major[which(df$degree1 == "BA")] <- df$subj1[which(df$degree1 == "BA")] df$major[which(df$degree1 != "BA" & df$degree2 == "BA")] <-    df$subj2[which(df$degree1 != "BA" & df$degree2 == "BA")] 

Depending on the depth of the nested ifelse statements, another approach might be better for your real data.


EDIT:

I was going to write a third reason for the original code failing (namely that df$major wasn't yet assigned), but it works for me without having to do that. This was a problem I remember having in the past, though. What version of R are you running? (2.15.0 for me.) This step is not necessary if you use the ifelse() approach. Your solution is fine when using [, although I would have chosen

df$major <- NA 

To get the character values of the subjects, instead of the factor level index, use as.character() (which for factors is equivalent to and calls levels(x)[x]):

df$major[which(df$degree1 == "BA")] <- as.character(df$subj1)[which(df$degree1 == "BA")] df$major[which(df$degree1 != "BA" & df$degree2 == "BA")] <-    as.character(df$subj2)[which(df$degree1 != "BA" & df$degree2 == "BA")] 

Same for the ifelse() way:

df$major2 <- ifelse(df$degree1 == "BA", as.character(df$subj1),   ifelse(df$degree2 == "BA", as.character(df$subj2), NA)) 
like image 143
BenBarnes Avatar answered Oct 14 '22 14:10

BenBarnes


In general, the ifelse function is the right choice for these situations, something like:

df$major = ifelse((!is.na(df$degree1) & df$degree1 == "BA") & (is.na(df$degree2) | df$degree1 != "BA"), df$subj1, df$subj2) 

However, its precise use depends on what you do if both df$degree1 and df$degree2 are "BA".

like image 30
David Robinson Avatar answered Oct 14 '22 13:10

David Robinson