Question
Using dplyr
, how do I select the top and bottom observations/rows of grouped data in one statement?
Data & Example
Given a data frame:
df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), stopId=c("a","b","c","a","b","c","a","b","c"), stopSequence=c(1,2,3,3,1,4,3,1,2))
I can get the top and bottom observations from each group using slice
, but using two separate statements:
firstStop <- df %>% group_by(id) %>% arrange(stopSequence) %>% slice(1) %>% ungroup lastStop <- df %>% group_by(id) %>% arrange(stopSequence) %>% slice(n()) %>% ungroup
Can I combine these two statements into one that selects both top and bottom observations?
First, you need to write a CTE in which you assign a number to each row within each group. To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ).
You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.
The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.
To select rows using selection symbols for character or graphic data, use the LIKE keyword in a WHERE clause, and the underscore and percent sign as selection symbols. You can create multiple row conditions, and use the AND, OR, or IN keywords to connect the conditions.
There is probably a faster way:
df %>% group_by(id) %>% arrange(stopSequence) %>% filter(row_number()==1 | row_number()==n())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With