Using dplyr first and last but ignoring NA values

Tags:

dplyr

I have the below dataframe and I need to ignore the missing value for the last value for id 9.

firstlast <- data.frame(id = as.factor(c("01", "01", "01", "01", "01", "04", "04", "05", "05", "05", "05", "05", "09", "09", "09", "09", "09")),
                 var_a = c(13, 21, 32, 33, 44, 21, 33, 35, 17, 18, 21, 22, 17, 13, 33, 32, NA))

The df looks like:

   id var_a
1  01    13
2  01    21
3  01    32
4  01    33
5  01    44
6  04    21
7  04    33
8  05    35
9  05    17
10 05    18
11 05    21
12 05    22
13 09    17
14 09    13
15 09    33
16 09    32
17 09    NA

My attempt thus far:

firstlast <- firstlast %>% group_by(id) %>%
  mutate(var_first = first(var_a)) %>%
  mutate(var_last = last(var_a)) %>%
  mutate(change = var_last - var_first)

creates:

       id var_a var_first var_last change
   (fctr) (dbl)     (dbl)    (dbl)  (dbl)
1      01    13        13       44     31
2      01    21        13       44     31
3      01    32        13       44     31
4      01    33        13       44     31
5      01    44        13       44     31
6      04    21        21       33     12
7      04    33        21       33     12
8      05    35        35       22    -13
9      05    17        35       22    -13
10     05    18        35       22    -13
11     05    21        35       22    -13
12     05    22        35       22    -13
13     09    17        17       NA     NA
14     09    13        17       NA     NA
15     09    33        17       NA     NA
16     09    32        17       NA     NA
17     09    NA        17       NA     NA

However I need to ignore the missing value for the last NA observation for id 9 and instead use the second to last (or the first observed value). When I include mutate(var_last = last(var_a), na.rm=TRUE) at that line, I then get a full column of na.rm = TRUE and not ignoring the NA value.

The end df I am hoping to achieve is...

       id var_a var_first var_last change
   (fctr) (dbl)     (dbl)    (dbl)  (dbl)
1      01    13        13       44     31
2      01    21        13       44     31
3      01    32        13       44     31
4      01    33        13       44     31
5      01    44        13       44     31
6      04    21        21       33     12
7      04    33        21       33     12
8      05    35        35       22    -13
9      05    17        35       22    -13
10     05    18        35       22    -13
11     05    21        35       22    -13
12     05    22        35       22    -13
13     09    17        17       32     15
14     09    13        17       32     15
15     09    33        17       32     15
16     09    32        17       32     15
17     09    NA        17       32     15

Thank you!

345

asked Jul 06 '16 17:07

b222

1 Answers

mutate doesn't have an na.rm option, neither does first() or last() which would be more appropriate here. You can omit missing values yourself with na.omit:

firstlast <- firstlast %>% group_by(id) %>%
    mutate(
        var_first = first(na.omit(var_a)),
        var_last = last(na.omit(var_a)),
        change = var_last - var_first
    )
# Source: local data frame [17 x 5]
# Groups: id [4]
# 
#        id var_a var_first var_last change
#    (fctr) (dbl)     (dbl)    (dbl)  (dbl)
# 1      01    13        13       44     31
# 2      01    21        13       44     31
# 3      01    32        13       44     31
# 4      01    33        13       44     31
# 5      01    44        13       44     31
# 6      04    21        21       33     12
# 7      04    33        21       33     12
# 8      05    35        35       22    -13
# 9      05    17        35       22    -13
# 10     05    18        35       22    -13
# 11     05    21        35       22    -13
# 12     05    22        35       22    -13
# 13     09    17        17       32     15
# 14     09    13        17       32     15
# 15     09    33        17       32     15
# 16     09    32        17       32     15
# 17     09    NA        17       32     15

As a sidenote, giving mutate an na.rm argument doesn't make much sense. What NA values would you omit? Any rows from any columns used in the calculation? Other inputs? Just the result? How would you fill the removed rows to still have the same values? It's much better to be explicit as above.

152

answered Oct 21 '22 09:10

Gregor Thomas

Related questions
                            
                                ggplot2 move facet layout
                            
                                installing R package openNLP in R
                            
                                How to convert Afex or car ANOVA models to lmer? Observed variables
                            
                                How to fix "Unable to find GhostScript executable to run checks on size reduction" error upon package check in R?
                            
                                regex with mongolite in R
                            
                                Loop through data.table and create new columns basis some condition
                            
                                Replacing each value in a vector with its rank number for a data.frame
                            
                                Disable column editing in rhandsontable?
                            
                                How do I get R to spit out the critical value for F-statistic based on ANOVA?
                            
                                Taking the transpose of square blocks in a rectangular matrix r
                            
                                How to write a table in PostgreSQL from R?
                            
                                Keep string after first number
                            
                                ggplot2: Legend does not show all categories even with drop=FALSE
                            
                                ggplot2: how to add lines and p-values on a grouped barplot?
                            
                                ggplot annotation in fixed place in the chart
                            
                                How can I convert numeric to hours and minutes in R?
                            
                                how to filter top 10 percentile of a column in a data frame group by id using dplyr
                            
                                VennDiagram without group names and with Arial font
                            
                                interpolate irregular x,y data points into regular grid for contour mapping
                            
                                round all float numbers in a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With