I have a set of variables coded as binomial. <pre class="prettyprint"><code> Pre VALUE_1 VALUE_2 VALUE_3 VALUE_4 VALUE_5 VALUE_6 VALUE_7 VALUE_8 1 1 0 0 0 0 0 1 0 0 2 1 0 0 0 0 1 0 0 0 3 1 0 0 0 0 1 0 0 0 4 1 0 0 0 0 1 0 0 0 </code></pre> I would like to merge the variables (VALUE_1, VALUE_2...VALUE_8) into one single ordered factor, while conserving the column (Pre) as is, duch that the data would look like this: <pre class="prettyprint"><code> Pre VALUE 1 1 VALUE_6 2 1 VALUE_5 3 1 VALUE_5 </code></pre> Or even better: <pre class="prettyprint"><code> Pre VALUE 1 1 6 2 1 5 3 1 5 </code></pre> I am aware that this exists: Recoding dummy variable to ordered factor But when I try the code used in that post, I receive the following error: <pre class="prettyprint"><code>PA2$Factor = factor(apply(PA2, 1, function(x) which(x == 1)), labels = colnames(PA2)) Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? </code></pre> Any help would be appreciated

A quick solution would be something like <pre class="prettyprint"><code>Res <- cbind(df[1], VALUE = factor(max.col(df[-1]), ordered = TRUE)) Res # Pre VALUE # 1 1 6 # 2 1 5 # 3 1 5 # 4 1 5 str(Res) # 'data.frame': 4 obs. of 2 variables: # $ Pre : int 1 1 1 1 # $ VALUE: Ord.factor w/ 2 levels "5"<"6": 2 1 1 1 </code></pre> OR if you want the actual names of the columns (as Pointed by @BondedDust), you can use the same methodology to extract them <pre class="prettyprint"><code>factor(names(df)[1 + max.col(df[-1])], ordered = TRUE) # [1] VALUE_6 VALUE_5 VALUE_5 VALUE_5 # Levels: VALUE_5 < VALUE_6 </code></pre> OR you can use your own <code>which</code> strategy in the following way (btw, <code>which</code> is vectorized so no need in using <code>apply</code> with a margin of 1 on it) <pre class="prettyprint"><code>cbind(df[1], VALUE = factor(which(df[-1] == 1, arr.ind = TRUE)[, 2], ordered = TRUE)) </code></pre> OR you can do <code>matrix</code> multiplication (contributed by @akrun) <pre class="prettyprint"><code>cbind(df[1], VALUE = factor(as.matrix(df[-1]) %*% seq_along(df[-1]), ordered = TRUE)) </code></pre>

dummy variables to single categorical variable (factor) in R

Tags:

r

factors

I have a set of variables coded as binomial.

   Pre VALUE_1 VALUE_2 VALUE_3 VALUE_4 VALUE_5 VALUE_6 VALUE_7 VALUE_8 
1   1       0       0       0       0       0       1       0       0       
2   1       0       0       0       0       1       0       0       0       
3   1       0       0       0       0       1       0       0       0       
4   1       0       0       0       0       1       0       0       0

I would like to merge the variables (VALUE_1, VALUE_2...VALUE_8) into one single ordered factor, while conserving the column (Pre) as is, duch that the data would look like this:

  Pre VALUE
1  1  VALUE_6
2  1  VALUE_5
3  1  VALUE_5

Or even better:

  Pre VALUE
1  1  6
2  1  5
3  1  5

I am aware that this exists: Recoding dummy variable to ordered factor

But when I try the code used in that post, I receive the following error:

PA2$Factor = factor(apply(PA2, 1, function(x) which(x == 1)), labels = colnames(PA2)) 

Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

Any help would be appreciated

732

asked Apr 25 '15 21:04

Sky

1 Answers

A quick solution would be something like

Res <- cbind(df[1], VALUE = factor(max.col(df[-1]), ordered = TRUE))
Res
#   Pre VALUE
# 1   1     6
# 2   1     5
# 3   1     5
# 4   1     5

str(Res)
# 'data.frame':  4 obs. of  2 variables:
# $ Pre  : int  1 1 1 1
# $ VALUE: Ord.factor w/ 2 levels "5"<"6": 2 1 1 1

OR if you want the actual names of the columns (as Pointed by @BondedDust), you can use the same methodology to extract them

factor(names(df)[1 + max.col(df[-1])], ordered = TRUE)
# [1] VALUE_6 VALUE_5 VALUE_5 VALUE_5
# Levels: VALUE_5 < VALUE_6

OR you can use your own which strategy in the following way (btw, which is vectorized so no need in using apply with a margin of 1 on it)

cbind(df[1], VALUE = factor(which(df[-1] == 1, arr.ind = TRUE)[, 2], ordered = TRUE))

OR you can do matrix multiplication (contributed by @akrun)

cbind(df[1], VALUE = factor(as.matrix(df[-1]) %*% seq_along(df[-1]), ordered = TRUE))

answered Oct 11 '22 00:10

David Arenburg

Related questions
                            
                                R shiny: Edit the format of a table output
                            
                                R shiny bi-directional reactive widgets
                            
                                How to pass a vector of ggplot objects to grid.arrange function?
                            
                                why does split coerce double to integer in R and is there a workaround
                            
                                Counting rows in data.table that meet a condition
                            
                                Draw a half circle with ggplot2
                            
                                R ReporteRs: Editing Existing Slides
                            
                                All N Combinations of All Subsets
                            
                                Create link to the other part of the Shiny app
                            
                                Reproduce the `expand.grid` function from R in Julia
                            
                                How to drop unused levels in table with data.table?
                            
                                R: Extract complete cases/included observations from linear model or formula variables
                            
                                How can I add labels to a choropleth map created using ggplot2?
                            
                                dplyr's mutate_each within function works but matches() does not find argument
                            
                                How to check if a file is compressed in R
                            
                                Count word frequencies in list-of-lists-of-words
                            
                                R k-means algorithm custom centers
                            
                                Auto populate week dates
                            
                                How to get labels in my ggplot heatmap?
                            
                                Caret error using GBM, but not without caret

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With