Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutate over multiple columns, skip NA values

Tags:

r

dplyr

Reprex

data <- data.frame(
  subjid = c(3,4,7,8,11,14,18,19,20,21),
  pct_pvr_3 = c(10,NA,NA,NA,20,NA,70,NA,NA,NA),
  pct_pvr_2 = c(90,NA,90,30,30,50,30,100,NA,NA),
  pct_pvr_1 = c(NA,NA,10,20,30,NA,NA,NA,NA,NA),
  pct_pvr_0 = c(NA,NA,NA,50,NA,NA,NA,NA,NA,NA)
)

Problem: I want to create a new column hscore with the following formula:

hscore = ((0*pct_pvr_0)+(1*pct_pvr_1)+(2*pct_pvr_2)+(3*pct_pvr_3))

However, as many of the columns are NA, the resulting column is full of NAs. I think I may be able to get around this issue with a bunch of if_else statements. Before going that route, I wanted to see if there were any simpler solutions using dplyr or other packages.

Current Code:

data <- data %>%
     mutate(hscore = ((0*pct_pvr_0)+(1*pct_pvr_1)+(2*pct_pvr_2)+(3*pct_pvr_3)))

Current Output:

# A tibble: 56 x 6
   subjid pct_pvr_0 pct_pvr_1 pct_pvr_2 pct_pvr_3 hscore
    <dbl>     <dbl>     <dbl>     <dbl>     <dbl>  <dbl>
 1      3        NA        NA        90        10     NA
 2      4        NA        NA        NA        NA     NA
 3      7        NA        10        90        NA     NA
 4      8        50        20        30        NA     NA
 5     11        NA        30        50        20     NA
 6     14        NA        NA       100        NA     NA
 7     18        NA        NA        30        70     NA
 8     19        NA        NA        NA        NA     NA
 9     20        NA        NA        NA        NA     NA
10     21        NA        NA        NA        NA     NA

dplyr Solution: Replaced NAs as 0

data <- data %>% 
replace(is.na(.),0) %>% 
mutate(hscore = ((0*pct_pvr_0)+(1*pct_pvr_1)+(2*pct_pvr_2)+(3*pct_pvr_3)))

data
   subjid pct_pvr_3 pct_pvr_2 pct_pvr_1 pct_pvr_0 hscore
1       3        10        90         0         0    210
2       4         0         0         0         0      0
3       7         0        90        10         0    190
4       8         0        30        20        50     80
5      11        20        30        30         0    150
6      14         0        50         0         0    100
7      18        70        30         0         0    270
8      19         0       100         0         0    200
9      20         0         0         0         0      0
10     21         0         0         0         0      0
like image 878
Adam Avatar asked Dec 01 '22 13:12

Adam


1 Answers

library(tidyverse)

data %>%
  left_join(data %>%
    pivot_longer(-subjid) %>%
    group_by(subjid) %>%
    summarise(hscore = sum(coalesce(value * parse_number(name), 0)))
  )

parse_number() extracts the ending number part of the column name. coalesce takes care of the NA's by replacing any NA with zero.

Joining, by = "subjid"
   subjid pct_pvr_3 pct_pvr_2 pct_pvr_1 pct_pvr_0 hscore
1       3        10        90        NA        NA    210
2       4        NA        NA        NA        NA      0
3       7        NA        90        10        NA    190
4       8        NA        30        20        50     80
5      11        20        30        30        NA    150
6      14        NA        50        NA        NA    100
7      18        70        30        NA        NA    270
8      19        NA       100        NA        NA    200
9      20        NA        NA        NA        NA      0
10     21        NA        NA        NA        NA      0
like image 124
Jon Spring Avatar answered Dec 04 '22 12:12

Jon Spring