Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying the same factor levels to multiple variables in an R data frame

Tags:

r

I am working with a dataset that includes 16 questions where the response set is identical (Yes, No, Unknown or Missing). I am processing the data using R and I want to turn each of the variables into a factor. For a single variable, I could use the following construction:

df <- read.csv("thedata.csv")
df$q1 <- factor(x=df$q1,levels=c(-9,0,1),
                        labels=c("Unknown or Missing","No","Yes))

I'd like to avoid typing that 16 times. I could do it with a for(), but I was wondering if there is a clearer, more R way to do it. Some sample data:

structure(list(q1 = c(0, 0, 0, -9, 0), q2 = c(0, 0, 1, 0, 0),
               q3 = c(0, 0, 1, 0, 0), q4 = c(1, 1, 0, 0, 0),
               q5 = c(0, 1, 1, 1, 1), q6 = c(1, 1, 1, 0, 0),
               q7 = c(0, 0, 0, 1, 0), q8 = c(0, 0, 1, 1, 1),
               q9 = c(1, 0, -9, 1, 0), q10 = c(1, 0, 0, 0, 0),
               q11 = c(0, 1, 1, 0, 0), q12 = c(1, 1, 0, 0, 0),
               q13 = c(1, -9, 1, 0, 0), q14 = c(0, 0, 0, 1, 1),
               q15 = c(1, 0, 1, 1, 0), q16 = c(1, 1, 1, 1, 1)),
               .Names = c("q1", "q2", "q3", "q4", "q5", "q6", "q7",
                          "q8", "q9", "q10", "q11", "q12", "q13",
                          "q14", "q15", "q16"),
               row.names = c(NA, -5L), class = "data.frame")
like image 526
TARehman Avatar asked Mar 18 '13 18:03

TARehman


People also ask

How do you set all variables as factors in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

What does levels () do in R?

Levels() function provides access to the levels attribute of a variable. The first form returns the value of the levels of its argument and the second sets the attribute.

How do I factor a level in R?

How to create a factor in R? We can create a factor using the function factor() . Levels of a factor are inferred from the data if not provided. We can see from the above example that levels may be predefined even if not used.

What does factor () do in R?

What is Factor in R? Factor in R is a variable used to categorize and store the data, having a limited number of different values. It stores the data as a vector of integer values. Factor in R is also known as a categorical variable that stores both string and integer data values as levels.


2 Answers

df[] <- lapply(df, factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
str(df)

Likely to be faster than apply or sapply which need data.frame to reform/reclass those results. The trick here is that using [] on the LHS of the assignment preserves the structure of the target (because R "knows" what its class and dimensions are, and the need for data.frame on the list from lapply is not needed. If you had wanted to do this only with selected columns you could do this:

 df[colnums] <- lapply(df[colnums], factor, 
              levels=c(-9, 0, 1), 
              labels = c("Unknown or Missing", "No", "Yes"))
 str(df)
like image 147
IRTFM Avatar answered Nov 09 '22 00:11

IRTFM


An R base solution using apply

 data.frame(apply(df, 2, factor, 
                 levels=c(-9, 0, 1), 
                 labels = c("Unknown or Missing", "No", "Yes")))

Using sapply

data.frame(sapply(df, factor, levels=c(-9, 0, 1), 
         labels = c("Unknown or Missing", "No", "Yes")))
like image 1
Jilber Urbina Avatar answered Nov 09 '22 02:11

Jilber Urbina