Here is my data frame:
a <- data.frame(id=c(rep("A",2),rep("B",2)),
x=c(rep(2,2),rep(3,2)),
p.ABC= c(1,NA,1,1),
p.DEF= c(NA,1,NA,NA),
p.TAR= c(1,NA,1,1),
p.REP= c(NA,1,1,NA),
p.FAR= c(NA,NA,1,1))
I Want to create a new character column (using mutate()
in the dplyr
library in R), which tells (by row) the name of the columns that have a non-NA value (here the non-NA value is always 1). However, it should only search among the columns that start with "p." and it should order the names by alphabetical order and then concatenate them using the expression "_" as a separator. You can find below the desired result, under the column called "name":
data.frame(id=c(rep("A",2),rep("B",2)),
x=c(rep(2,2),rep(3,2)),
p.ABC= c(1,NA,1,1),
p.DEF= c(NA,1,NA,NA),
p.TAR= c(1,NA,1,1),
p.REP= c(NA,1,1,NA),
p.FAR= c(NA,NA,1,1),
name=c("ABC_TAR","DEF_REP","ABC_FAR_REP_TAR","ABC_FAR_TAR"))
I would like to emphasize that I'm really looking for a solution using dplyr
, as I would be able to do it without it (but it doesn't look pretty and it's slow).
There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na.
In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.
Here is an option with tidyverse
, where we reshape the data into 'long' format with pivot_longer
, grouped by row_number()
), paste
the column name column 'name' values after removing the prefix part and then bind that column with the original data
library(dplyr)
library(stringr)
library(tidyr)
a %>%
mutate(rn = row_number()) %>%
select(-id, -x) %>%
pivot_longer(cols = -rn, values_drop_na = TRUE) %>%
group_by(rn) %>%
summarise(name = str_c(str_remove(name, ".*\\."), collapse="_"),
.groups = 'drop') %>%
select(-rn) %>%
bind_cols(a, .)
-output
# id x p.ABC p.DEF p.TAR p.REP p.FAR name
#1 A 2 1 NA 1 NA NA ABC_TAR
#2 A 2 NA 1 NA 1 NA DEF_REP
#3 B 3 1 NA 1 1 1 ABC_TAR_REP_FAR
#4 B 3 1 NA 1 NA 1 ABC_TAR_FAR
Or use pmap
library(purrr)
a %>%
mutate(name = pmap_chr(select(cur_data(), contains('.')), ~ {
nm1 <- c(...)
str_c(str_remove(names(nm1)[!is.na(nm1)], '.*\\.'), collapse="_")}))
# id x p.ABC p.DEF p.TAR p.REP p.FAR name
#1 A 2 1 NA 1 NA NA ABC_TAR
#2 A 2 NA 1 NA 1 NA DEF_REP
#3 B 3 1 NA 1 1 1 ABC_TAR_REP_FAR
#4 B 3 1 NA 1 NA 1 ABC_TAR_FAR
Or use apply
in base R
apply(a[-(1:2)], 1, function(x) paste(sub(".*\\.", "",
names(x)[!is.na(x)]), collapse="_"))
#[1] "ABC_TAR" "DEF_REP" "ABC_TAR_REP_FAR" "ABC_TAR_FAR"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With