Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the dplyr library in R to "print" the name of the non-NA columns

Tags:

r

dplyr

Here is my data frame:

a <- data.frame(id=c(rep("A",2),rep("B",2)),
                x=c(rep(2,2),rep(3,2)),
                p.ABC= c(1,NA,1,1),
                p.DEF= c(NA,1,NA,NA),
                p.TAR= c(1,NA,1,1),
                p.REP= c(NA,1,1,NA),
                p.FAR= c(NA,NA,1,1))

I Want to create a new character column (using mutate() in the dplyr library in R), which tells (by row) the name of the columns that have a non-NA value (here the non-NA value is always 1). However, it should only search among the columns that start with "p." and it should order the names by alphabetical order and then concatenate them using the expression "_" as a separator. You can find below the desired result, under the column called "name":

data.frame(id=c(rep("A",2),rep("B",2)),
                x=c(rep(2,2),rep(3,2)),
                p.ABC= c(1,NA,1,1),
                p.DEF= c(NA,1,NA,NA),
                p.TAR= c(1,NA,1,1),
                p.REP= c(NA,1,1,NA),
                p.FAR= c(NA,NA,1,1),
                name=c("ABC_TAR","DEF_REP","ABC_FAR_REP_TAR","ABC_FAR_TAR"))

I would like to emphasize that I'm really looking for a solution using dplyr, as I would be able to do it without it (but it doesn't look pretty and it's slow).

like image 607
Anthony Hauser Avatar asked Apr 17 '21 22:04

Anthony Hauser


People also ask

How do I select a column without NA in R?

There are two easy methods to select columns of an R data frame without missing values, first one results in a vector and other returns a matrix. For example, if we have a data frame called df then the first method can be used as df[,colSums(is.na(df))==0] and the second method will be used as t(na.

How do I find missing values in a column in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.


Video Answer


1 Answers

Here is an option with tidyverse, where we reshape the data into 'long' format with pivot_longer, grouped by row_number()), paste the column name column 'name' values after removing the prefix part and then bind that column with the original data

library(dplyr)
library(stringr)
library(tidyr)
a %>% 
    mutate(rn = row_number()) %>%
    select(-id, -x) %>%
    pivot_longer(cols = -rn, values_drop_na = TRUE) %>%
    group_by(rn) %>%
    summarise(name = str_c(str_remove(name, ".*\\."), collapse="_"), 
         .groups = 'drop') %>%
    select(-rn) %>% 
    bind_cols(a, .)

-output

# id x p.ABC p.DEF p.TAR p.REP p.FAR            name
#1  A 2     1    NA     1    NA    NA         ABC_TAR
#2  A 2    NA     1    NA     1    NA         DEF_REP
#3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

Or use pmap

library(purrr)
a %>% 
   mutate(name = pmap_chr(select(cur_data(), contains('.')), ~ {
       nm1 <- c(...)
       str_c(str_remove(names(nm1)[!is.na(nm1)], '.*\\.'), collapse="_")}))
#  id x p.ABC p.DEF p.TAR p.REP p.FAR            name
#1  A 2     1    NA     1    NA    NA         ABC_TAR
#2  A 2    NA     1    NA     1    NA         DEF_REP
#3  B 3     1    NA     1     1     1 ABC_TAR_REP_FAR
#4  B 3     1    NA     1    NA     1     ABC_TAR_FAR

Or use apply in base R

apply(a[-(1:2)], 1, function(x) paste(sub(".*\\.", "", 
        names(x)[!is.na(x)]), collapse="_"))
#[1] "ABC_TAR"         "DEF_REP"         "ABC_TAR_REP_FAR" "ABC_TAR_FAR"    
like image 197
akrun Avatar answered Mar 23 '23 12:03

akrun