Consider the following data frame: <pre class="prettyprint"><code>TEST <- structure(list(Value = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Select = structure(c(2L, 1L, 3L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), A = c(5L, 5L, 4L, 3L, 4L, 3L, 5L, 3L, 3L, 4L, 5L, 4L), B = c(10L, 8L, 7L, 6L, 3L, 8L, 8L, 7L, 8L, 9L, 11L, 8L), C = c(0L, 1L, 3L, 2L, 0L, 3L, 0L, 2L, 0L, 1L, 1L, 0L)), .Names = c("Value", "Select", "A", "B", "C"), row.names = c(NA, -12L), class = "data.frame") </code></pre> I want to efficiently assign the Value column, on a row-by-row basis, from the set of columns A, B and C based on the Select column. For example, in row 1 I want Value to be equal to the element in column B - i.e. Value[1]=10. My current method is to use a for loop: <pre class="prettyprint"><code>for( idx in 1:nrow(TEST) ) { TEST$Value[idx] <- TEST[ idx, as.character(TEST$Select[idx]) ] } </code></pre> Which results in the desired output: <pre class="prettyprint"> Value Select A B C 1 10 B 5 10 0 2 5 A 5 8 1 3 3 C 4 7 3 4 6 B 3 6 2 5 3 B 4 3 0 6 3 A 3 8 3 7 5 A 5 8 0 8 7 B 3 7 2 9 3 A 3 8 0 10 4 A 4 9 1 11 1 C 5 11 1 12 0 C 4 8 0 </pre> Is there a more efficient or alternative way of doing this? I feel like this is some sort of merge() or table join type operation. P.S. I wasn't quite sure how to describe this operation - any suggestions for a better question/description also welcome.

I would use matrix indexing and <code>match</code>. That approach is vectorized, hence much faster than a <code>for</code> or <code>apply</code> loop would give you: <pre class="prettyprint"><code>L <- c("A", "B", "C") TEST$Value <- TEST[L][cbind(seq_len(nrow(TEST)), match(TEST$Select, L))] </code></pre> If you are not familiar with matrix indexing, it is documented inside <code>?"["</code>: <blockquote> A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector </blockquote>

Index a data frame row-by-row using column names selected from a variable

Tags:

r

Consider the following data frame:

TEST <- structure(list(Value = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
  Select = structure(c(2L, 1L, 3L, 2L, 2L, 1L, 1L,
  2L, 1L, 1L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
  A = c(5L, 5L, 4L, 3L, 4L, 3L, 5L, 3L, 3L, 4L, 5L, 4L), 
  B = c(10L, 8L, 7L, 6L, 3L, 8L, 8L, 7L, 8L, 9L, 11L, 8L), 
  C = c(0L, 1L, 3L, 2L, 0L, 3L, 0L, 2L, 0L, 1L, 1L, 0L)), 
  .Names = c("Value", "Select", "A", "B", "C"), 
  row.names = c(NA, -12L), 
  class = "data.frame")

I want to efficiently assign the Value column, on a row-by-row basis, from the set of columns A, B and C based on the Select column.

For example, in row 1 I want Value to be equal to the element in column B - i.e. Value[1]=10.

My current method is to use a for loop:

for( idx in 1:nrow(TEST) ) {
  TEST$Value[idx] <- TEST[ idx, as.character(TEST$Select[idx]) ]
}

Which results in the desired output:

    Value Select A  B C
 1     10      B 5 10 0
 2      5      A 5  8 1
 3      3      C 4  7 3
 4      6      B 3  6 2
 5      3      B 4  3 0
 6      3      A 3  8 3
 7      5      A 5  8 0
 8      7      B 3  7 2
 9      3      A 3  8 0
 10     4      A 4  9 1
 11     1      C 5 11 1
 12     0      C 4  8 0

Is there a more efficient or alternative way of doing this? I feel like this is some sort of merge() or table join type operation.

P.S. I wasn't quite sure how to describe this operation - any suggestions for a better question/description also welcome.

929

asked Aug 05 '13 11:08

Simon

1 Answers

I would use matrix indexing and match. That approach is vectorized, hence much faster than a for or apply loop would give you:

L <- c("A", "B", "C")
TEST$Value <- TEST[L][cbind(seq_len(nrow(TEST)), match(TEST$Select, L))]

If you are not familiar with matrix indexing, it is documented inside ?"[":

A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector

145

answered Sep 30 '22 19:09

flodel

Related questions
                            
                                Can R produce on-the-fly graphs for website?
                            
                                How do I use elements of a dataframe like hash keys / dictionary keys / primary keys?
                            
                                Import date-time at a specified timezone, disregard Daylight Savings Time
                            
                                What exactly does R CMD Sweave --pdf do?
                            
                                How to pass a list to ggplot2?
                            
                                Is there any existing syntax checker for GNU R
                            
                                What is the difference between sort() and sort.list() in R?
                            
                                aggregate/sum with ggplot
                            
                                How to predict x values from a linear model (lm)
                            
                                How to specify in which order to load S4 methods when using roxygen2
                            
                                how to create an R data frame from a xml file
                            
                                ggplot font family change between versions
                            
                                unexpected output from aggregate
                            
                                Replace NAs by simulating data
                            
                                multiple choice box in R/shiny - adding a scroll bar
                            
                                R package compilation with dependency on data.table
                            
                                Error running ImageMagick from R: Invalid parameter
                            
                                My R has memory leaks?
                            
                                Convex hull ggplot using data.tables in R
                            
                                Why does naiveBayes return all NA's for multiclass classification in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With