Using the dplyr library in R to "print" the name of the non-NA columns

Tags:

dplyr

Here is my data frame:

a <- data.frame(id=c(rep("A",2),rep("B",2)),
                x=c(rep(2,2),rep(3,2)),
                p.ABC= c(1,NA,1,1),
                p.DEF= c(NA,1,NA,NA),
                p.TAR= c(1,NA,1,1),
                p.REP= c(NA,1,1,NA),
                p.FAR= c(NA,NA,1,1))

I Want to create a new character column (using mutate() in the dplyr library in R), which tells (by row) the name of the columns that have a non-NA value (here the non-NA value is always 1). However, it should only search among the columns that start with "p." and it should order the names by alphabetical order and then concatenate them using the expression "_" as a separator. You can find below the desired result, under the column called "name":

data.frame(id=c(rep("A",2),rep("B",2)),
                x=c(rep(2,2),rep(3,2)),
                p.ABC= c(1,NA,1,1),
                p.DEF= c(NA,1,NA,NA),
                p.TAR= c(1,NA,1,1),
                p.REP= c(NA,1,1,NA),
                p.FAR= c(NA,NA,1,1),
                name=c("ABC_TAR","DEF_REP","ABC_FAR_REP_TAR","ABC_FAR_TAR"))

I would like to emphasize that I'm really looking for a solution using dplyr, as I would be able to do it without it (but it doesn't look pretty and it's slow).

607

asked Apr 17 '21 22:04

Anthony Hauser

Video Answer

1 Answers

Here is an option with tidyverse, where we reshape the data into 'long' format with pivot_longer, grouped by row_number()), paste the column name column 'name' values after removing the prefix part and then bind that column with the original data

library(dplyr)
library(stringr)
library(tidyr)
a %>% 
    mutate(rn = row_number()) %>%
    select(-id, -x) %>%
    pivot_longer(cols = -rn, values_drop_na = TRUE) %>%
    group_by(rn) %>%
    summarise(name = str_c(str_remove(name, ".*\\."), collapse="_"), 
         .groups = 'drop') %>%
    select(-rn) %>% 
    bind_cols(a, .)

-output

# id x p.ABC p.DEF p.TAR p.REP p.FAR            name
#1  A 2     1    NA     1    NA    NA         ABC_TAR
#2  A 2    NA     1    NA     1    NA         DEF_REP
#3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

Or use pmap

library(purrr)
a %>% 
   mutate(name = pmap_chr(select(cur_data(), contains('.')), ~ {
       nm1 <- c(...)
       str_c(str_remove(names(nm1)[!is.na(nm1)], '.*\\.'), collapse="_")}))
#  id x p.ABC p.DEF p.TAR p.REP p.FAR            name
#1  A 2     1    NA     1    NA    NA         ABC_TAR
#2  A 2    NA     1    NA     1    NA         DEF_REP
#3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

Or use apply in base R

apply(a[-(1:2)], 1, function(x) paste(sub(".*\\.", "", 
        names(x)[!is.na(x)]), collapse="_"))
#[1] "ABC_TAR"         "DEF_REP"         "ABC_TAR_REP_FAR" "ABC_TAR_FAR"

197

answered Mar 23 '23 12:03

akrun

Related questions
                            
                                dplyr::mutate (assign na.rm =TRUE)
                            
                                Generate sequence with alternating increments in R? [duplicate]
                            
                                how to remove the negative values from a data frame in R
                            
                                MXNet package installation in R
                            
                                Error in installing packages 'RGtk2' and 'rattle' in R
                            
                                trimws bug? leading whitespace not removed
                            
                                How would you fit a gamma distribution to a data in R?
                            
                                Does PostgreSQL numeric type support infinity (and -infinity)?
                            
                                Why doesn't restarting R with Ctrl-Shift-F10 clear my environment variables?
                            
                                Looping over multiple lists with base R
                            
                                Extract substring and numbers from a string in R
                            
                                Loop to add new columns with ifelse
                            
                                Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1
                            
                                How to extend the 'summary' function to include sd, kurtosis and skew?
                            
                                Creating a waffle plot together with facets in ggplot2
                            
                                when trying to install rgeos R cannot find -lgeos
                            
                                plot circle segment defined by three points with ggplot2
                            
                                Recoding a semicolon separated list in R
                            
                                Using case_when with dplyr across
                            
                                Arrange data frame columns by class: numeric before character

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With