Hi I have a data frame with different data columns (say x, y, z ) and another column which specifies which one to choose. I want to use dplyr::mutate (or similar) to make a new column that has the value corresponding to value in the column specified by "choose". But I want to keep all the columns in the first place. In my real data I also have a few other columns with metadata.
Example data:
library(dplyr)
testdf <- data.frame(x = 1:5, y = 11:15, z = 101:105, choose = c("z","y","x","y","z"))
I can make this work in my example using case_when
but in my actual script the column names and choose column are generated and may have different values, so I don't want to hardcode what names there could be.
Desired output/test
mutate(testdf, selectedValue = case_when(choose == "x" ~x,
choose == "y"~ y,
choose == "z"~ z, T~NA_integer_))
#> x y z choose selectedValue
#> 1 1 11 101 z 101
#> 2 2 12 102 y 12
#> 3 3 13 103 x 3
#> 4 4 14 104 y 14
#> 5 5 15 105 z 105
Created on 2019-09-18 by the reprex package (v0.3.0)
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
3.1 Subset by Column Name Let's use the same df[] notation and subset() function to subset the data frame by column name in R. To subset columns use select argument with values as column names to subset() .
To extract columns with a particular string in column name of an R data frame, we can use grepl function for column names and then subset the data frame with single square brackets.
select_if(): Select columns based on a particular condition. One can use this function to, for example, select columns if they are numeric. Helper functions - starts_with(), ends_with(), contains(), matches(), one_of(): Select columns/variables based on their names.
Here is a data.table solution. This is a case where I don't think the dplyr solutions are more human readable than other solutions (base r and data.table specifically).
library(data.table)
testdt <- data.table(x = 1:5, y = 11:15, z = 101:105, choose = c("z","y","x","y","z"))
testdt[,selectedValue := get(choose), by = choose]
testdt
#> x y z choose selectedValue
#> 1: 1 11 101 z 101
#> 2: 2 12 102 y 12
#> 3: 3 13 103 x 3
#> 4: 4 14 104 y 14
#> 5: 5 15 105 z 105
Created on 2019-09-17 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With