Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to summarise a categorical variable with missing data?

I'm trying to perform a group_by summarise on a categorical variable, frailty score. The data is structured such that there are multiple observations for each subject, some of which contain missing data e.g.

Subject  Frailty
1        Managing well
1        NA
1        NA
2        NA
2        NA
2        Vulnerable
3        NA
3        NA
3        NA

I would like the data to be summarised so that a frailty description appears if there is one available, and NA if not e.g.

Subject  Frailty
1        Managing well
2        Vulnerable 
3        NA

I tried the following two approaches which both returned errors:

Mode <- function(x) {
ux <- na.omit(unique(x[!is.na(x)]))
tab <- tabulate(match(x, ux)); ux[tab == max(tab)]
}

data %>% 
group_by(Subject) %>% 
summarise(frailty = Mode(frailty)) %>% 

Error: Expecting a single value: [extent=2].
condense <- function(x){unique(x[!is.na(x)])}

data %>% 
group_by(subject) %>% 
summarise(frailty = condense(frailty))

Error: Column frailty must be length 1 (a summary value), not 0
like image 770
lexicalgap Avatar asked Mar 04 '20 17:03

lexicalgap


1 Answers

One solution involving dplyr could be:

df %>%
 group_by(Subject) %>%
 slice(which.min(is.na(Frailty)))

  Subject Frailty      
    <int> <chr>        
1       1 Managing_well
2       2 Vulnerable   
3       3 <NA>        
like image 146
tmfmnk Avatar answered Nov 01 '22 15:11

tmfmnk