dplyr arrange() function sort by missing values

Tags:

I am attempting to work through Hadley Wickham's R for Data Science and have gotten tripped up on the following question: "How could you use arrange() to sort all missing values to the start? (Hint: use is.na())" I am using the flights dataset included in the nycflights13 package. Given that arrange() sorts all unknown values to the bottom of the dataframe, I am not sure how one would do the opposite across the missing values of all variables. I realize that this question can be answered with base R code, but I am specifically interested in how this would be done using dplyr and a call to the arrange() and is.na() functions. Thanks.

520

asked Jun 11 '16 06:06

T. Gross

2 Answers

Try the easiest way, what he just showed you:

arrange(flights, desc(is.na(dep_time)))

The other nice shortcuts:

arrange(flights, !is.na(dep_time))

arrange(flights, -is.na(dep_time))

141

answered Sep 16 '22 18:09

Arkadiusz Choczaj

We can wrap it with desc to get the missing values at the start

flights %>% 
    arrange(desc(is.na(dep_time)),
           desc(is.na(dep_delay)),
           desc(is.na(arr_time)), 
           desc(is.na(arr_delay)),
           desc(is.na(tailnum)),
           desc(is.na(air_time)))

The NA values were only found in those variables based on

names(flights)[colSums(is.na(flights)) >0]
#[1] "dep_time"  "dep_delay" "arr_time"  "arr_delay" "tailnum"   "air_time"

Instead of passing each variable name at a time, we can also use NSE arrange_

nm1 <- paste0("desc(is.na(", names(flights)[colSums(is.na(flights)) >0], "))")

r1 <- flights %>%
        arrange_(.dots = nm1) 

r1 %>%
   head()
#year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum
#  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>   <chr>  <int>   <chr>
#1  2013     1     2       NA           1545        NA       NA           1910        NA      AA    133    <NA>
#2  2013     1     2       NA           1601        NA       NA           1735        NA      UA    623    <NA>
#3  2013     1     3       NA            857        NA       NA           1209        NA      UA    714    <NA>
#4  2013     1     3       NA            645        NA       NA            952        NA      UA    719    <NA>
#5  2013     1     4       NA            845        NA       NA           1015        NA      9E   3405    <NA>
#6  2013     1     4       NA           1830        NA       NA           2044        NA      9E   3716    <NA>
#Variables not shown: origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
#  time_hour <time>.

Update

With the newer versions of tidyverse (dplyr_0.7.3, rlang_0.1.2) , we can also make use of arrange_at, arrange_all, arrange_if

nm1 <- names(flights)[colSums(is.na(flights)) >0]
r2 <- flights %>% 
          arrange_at(vars(nm1), funs(desc(is.na(.))))

Or use arrange_if

f <- rlang::as_function(~ any(is.na(.)))
r3 <- flights %>% 
          arrange_if(f, funs(desc(is.na(.))))


identical(r1, r2)
#[1] TRUE

identical(r1, r3)
#[1] TRUE

answered Sep 18 '22 18:09

akrun

Related questions
                            
                                k-means return value in R
                            
                                Get Emacs to ignore contents of \Sexpr{} command in Sweave document to prevent incorrect $-based syntax highlighting
                            
                                How to use a non-ASCII symbol (e.g. £) in an R package function?
                            
                                How to break ties with order function in R
                            
                                sum of two lists with lists in R
                            
                                R - converting date and time fields to POSIXct with HHMMSS format
                            
                                closing unused RODBC handle
                            
                                Start new R package development on github
                            
                                How to show bars in ggplot2 in descending order of a numeric vector?
                            
                                Equivalent of transform in R/ddply in Python/pandas?
                            
                                How to list all graph vertex attributes in R?
                            
                                Evaluate at which size data.table is faster than data.frame
                            
                                How do I find the polygon nearest to a point in R?
                            
                                How to extract one specific group in dplyr
                            
                                How to reorder a legend in ggplot2?
                            
                                "You must provide a hash." error when using API to download data (in R)
                            
                                Plot title at bottom of plot using ggplot2
                            
                                How to convert factor levels to list, in R
                            
                                Using R to scrape the link address of a downloadable file from a web page?
                            
                                R: Understanding standard evaluation in mutate_

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dplyr arrange() function sort by missing values

Tags:

sorting

r

na

dplyr