Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index a data frame row-by-row using column names selected from a variable

Tags:

r

Consider the following data frame:

TEST <- structure(list(Value = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
  Select = structure(c(2L, 1L, 3L, 2L, 2L, 1L, 1L,
  2L, 1L, 1L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),
  A = c(5L, 5L, 4L, 3L, 4L, 3L, 5L, 3L, 3L, 4L, 5L, 4L), 
  B = c(10L, 8L, 7L, 6L, 3L, 8L, 8L, 7L, 8L, 9L, 11L, 8L), 
  C = c(0L, 1L, 3L, 2L, 0L, 3L, 0L, 2L, 0L, 1L, 1L, 0L)), 
  .Names = c("Value", "Select", "A", "B", "C"), 
  row.names = c(NA, -12L), 
  class = "data.frame")

I want to efficiently assign the Value column, on a row-by-row basis, from the set of columns A, B and C based on the Select column.

For example, in row 1 I want Value to be equal to the element in column B - i.e. Value[1]=10.

My current method is to use a for loop:

for( idx in 1:nrow(TEST) ) {
  TEST$Value[idx] <- TEST[ idx, as.character(TEST$Select[idx]) ]
}

Which results in the desired output:

    Value Select A  B C
 1     10      B 5 10 0
 2      5      A 5  8 1
 3      3      C 4  7 3
 4      6      B 3  6 2
 5      3      B 4  3 0
 6      3      A 3  8 3
 7      5      A 5  8 0
 8      7      B 3  7 2
 9      3      A 3  8 0
 10     4      A 4  9 1
 11     1      C 5 11 1
 12     0      C 4  8 0

Is there a more efficient or alternative way of doing this? I feel like this is some sort of merge() or table join type operation.

P.S. I wasn't quite sure how to describe this operation - any suggestions for a better question/description also welcome.

like image 929
Simon Avatar asked Aug 05 '13 11:08

Simon


People also ask

Which method is used to access the rows or columns of a DataFrame at particular positions in the index?

iloc | Select Column Indexes & Rows Index Positions. DataFrame provides indexing label iloc for accessing the column and rows by index positions i.e. It selects the columns and rows from DataFrame by index position specified in range.


1 Answers

I would use matrix indexing and match. That approach is vectorized, hence much faster than a for or apply loop would give you:

L <- c("A", "B", "C")
TEST$Value <- TEST[L][cbind(seq_len(nrow(TEST)), match(TEST$Select, L))]

If you are not familiar with matrix indexing, it is documented inside ?"[":

A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector

like image 145
flodel Avatar answered Sep 30 '22 19:09

flodel