I wonder if the following question has an elegant solution in dplyr.
To provide a simple reproducible example, consider the following data.frame:
df <- data.frame( a=1:5, b=2:6, c=3:7,
ref=c("a","a","b","b","c"),
stringsAsFactors = FALSE )
Here a
,b
,c
are regular numeric variables while ref
is meant to reference which column is the "main" value for that observation. For example:
a b c ref
1 1 2 3 a
2 2 3 4 a
3 3 4 5 b
4 4 5 6 b
5 5 6 7 c
For example, for observation 3, ref==b
and thus column b
contains the main value. While for observation 1, ref==a
and thus column a
contains the main value.
Having this data.frame the question is to create the new column with main
values for each observation using dplyr.
a b c ref main
1 1 2 3 a 1
2 2 3 4 a 2
3 3 4 5 b 4
4 4 5 6 b 5
5 5 6 7 c 7
I'll probably need to use dplyr for that since this one operation is a part of a longer dplyr %>%
data transformation chain.
Here's a simple, fast way that allows you to stick with dplyr
chaining:
require(data.table)
df %>% setDT %>% .[,main:=get(ref),by=ref]
# a b c ref main
# 1: 1 2 3 a 1
# 2: 2 3 4 a 2
# 3: 3 4 5 b 4
# 4: 4 5 6 b 5
# 5: 5 6 7 c 7
Thanks to @akrun for the idea for the fastest way and benchmarking to show it (see his answer).
setDT
modifies the class of df
so you won't have to convert to data.table
again in future chains.
The conversion should work with any future code in the chain, but both dplyr
and data.table
are under active development, so to be on the safe side, one could instead use
df %>% data.table %>% .[,main:=get(ref),by=ref]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With