Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select first and last row from grouped data

Tags:

r

dplyr

Question

Using dplyr, how do I select the top and bottom observations/rows of grouped data in one statement?

Data & Example

Given a data frame:

df <- data.frame(id=c(1,1,1,2,2,2,3,3,3),                   stopId=c("a","b","c","a","b","c","a","b","c"),                   stopSequence=c(1,2,3,3,1,4,3,1,2)) 

I can get the top and bottom observations from each group using slice, but using two separate statements:

firstStop <- df %>%   group_by(id) %>%   arrange(stopSequence) %>%   slice(1) %>%   ungroup  lastStop <- df %>%   group_by(id) %>%   arrange(stopSequence) %>%   slice(n()) %>%   ungroup 

Can I combine these two statements into one that selects both top and bottom observations?

like image 629
tospig Avatar asked Jul 21 '15 01:07

tospig


People also ask

How do I select the first and last row in a group by in SQL?

First, you need to write a CTE in which you assign a number to each row within each group. To do that, you can use the ROW_NUMBER() function. In OVER() , you specify the groups into which the rows should be divided ( PARTITION BY ) and the order in which the numbers should be assigned to the rows ( ORDER BY ).

How do I get the first and last row in R?

You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group. This is a combination of duplicates removal that leaves the first and last row at the same time.

How do I extract the last row in R?

The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.

How do I select specific rows in SQL?

To select rows using selection symbols for character or graphic data, use the LIKE keyword in a WHERE clause, and the underscore and percent sign as selection symbols. You can create multiple row conditions, and use the AND, OR, or IN keywords to connect the conditions.


1 Answers

There is probably a faster way:

df %>%   group_by(id) %>%   arrange(stopSequence) %>%   filter(row_number()==1 | row_number()==n()) 
like image 77
jeremycg Avatar answered Sep 27 '22 19:09

jeremycg