Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use string to select column per row in dplyr (or base R)

Tags:

r

dplyr

I have a column filled with other column names. I want get the value in each of the column names.

# three columns with values and one "key" column
library(dplyr)
data = data.frame(
  x = runif(10),
  y = runif(10),
  z = runif(10),
  key = sample(c('x', 'y', 'z'), 10, replace=TRUE)
)

# now get the value named in 'key'
data = data %>% mutate(value = VALUE_AT_COLUMN(key))

I'm pretty sure the answer has something to do with the lazy eval version of mutate, but I can't for the life of me figure it out.

Any help would be appreciated.

like image 377
sharoz Avatar asked Jan 28 '16 14:01

sharoz


3 Answers

We can try data.table. Convert the 'data.frame' to 'data.table' (setDT(data)), grouped by the sequence of rows, we use .SD to subset the columns specified by 'key'.

 library(data.table)
 setDT(data)[,  .SD[, key[[1L]], with=FALSE] ,1:nrow(data)]

Or another option is get after converting the 'key' to character class (as it factor) after grouping by sequence of rows as in the previous case.

 setDT(data)[, get(as.character(key)), 1:nrow(data)]

Here is one option with do

 library(dplyr)
 data %>% 
    group_by(rn = row_number()) %>%
    do(data.frame(., value= .[[.$key]]))
like image 140
akrun Avatar answered Oct 23 '22 09:10

akrun


Here's a Base R solution:

data$value = diag(as.matrix(data[,data$key]))
like image 5
Sam Dickson Avatar answered Oct 23 '22 09:10

Sam Dickson


For a memory efficient and fast solution, you should update your original data.table by performing a join as follows:

data[.(key2 = unique(key)), val := get(key2), on=c(key="key2"), by=.EACHI][]

For each key2 the matching rows in data$key are calculated. Those rows are updated with the values from the column that is contained in key2. For example, key2="x" matches with rows 1,2,6,8,10. The corresponding values of data$x are data$x[c(1,2,6,8,10)]. by=.EACHI ensures the expression get(key2) is executed for each value of key2.

Since this operation is performed only on unique values it should be considerably faster than running it row-wise. And since the data.table is updated by reference, it should be quite memory efficient (and that contributes to speed as well).

like image 5
Arun Avatar answered Oct 23 '22 09:10

Arun