Here's an example df: <pre class="prettyprint"><code>df <- structure(list(x = 1:30, y = 101:130, g = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("x", "y", "g"), row.names = c(NA, -30L), class = "data.frame") </code></pre> I would like to get the 10 lowest values of y for each group within the filtered data. But <pre class="prettyprint"><code>df2 <- df %>% filter(x>3) %>% group_by(g) %>% tail(y, n=10) </code></pre> only returns the rows for the last group (C in this case): <pre class="prettyprint"><code>Source: local data frame [10 x 3] Groups: g x y g 18 21 121 C 19 22 122 C 20 23 123 C 21 24 124 C 22 25 125 C 23 26 126 C 24 27 127 C 25 28 128 C 26 29 129 C 27 30 130 C </code></pre> What am I doing wrong?

You can use <code>tail</code> inside <code>do</code>. <pre class="prettyprint"><code>df2 <- df %>% filter(x>3) %>% group_by(g) %>% do(tail(., n=10)) </code></pre> The use of <code>.</code> is key for this to work. From the <code>do</code> help page: "You can use . to refer to the current group." Edit: As @beginneR pointed out, I was focusing on how to use <code>tail</code> in groups with <code>dplyr</code> and missed the part of the question where the OP asked for the 10 lowest values of <code>y</code>. To do this correctly would take the addition of <code>arrange</code>. With <code>tail</code>, this would mean arranging by descending order of <code>y</code>. <pre class="prettyprint"><code>df2 <- df %>% filter(x>3) %>% group_by(g) %>% arrange(desc(y)) %>% do(tail(., n=10)) </code></pre>

Here are two other options: <pre class="prettyprint"><code>df %>% filter(x>3) %>% group_by(g) %>% top_n(3, desc(y)) </code></pre> Here we make use of <code>top_n</code> but use <code>desc(y)</code> since we want the lowest <code>y</code> values instead of the largest ("top") <code>y</code> values. <pre class="prettyprint"><code>df %>% filter(x>3) %>% group_by(g) %>% arrange(y) %>% filter(1:n() <= 10) </code></pre> which is equal to <pre class="prettyprint"><code>df %>% filter(x>3) %>% group_by(g) %>% arrange(y) %>% slice(1:10) </code></pre> After the grouping, we sort each group by increasing <code>y</code> and then select the first 10 rows per group (or less if there are not 10 rows in a group). Since there was some confusion about lowest and last values to be selected: this answer selects the lowest values, not the last entries.

Using dplyr with filter, group_by & tail?

Tags:

r

dplyr

Here's an example df:

df <- structure(list(x = 1:30, y = 101:130, g = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("x", "y", "g"), row.names = c(NA, -30L), class = "data.frame")

I would like to get the 10 lowest values of y for each group within the filtered data.

But

df2 <- df %>% filter(x>3) %>% group_by(g) %>%  tail(y, n=10)

only returns the rows for the last group (C in this case):

Source: local data frame [10 x 3]
Groups: g

    x   y g
18 21 121 C
19 22 122 C
20 23 123 C
21 24 124 C
22 25 125 C
23 26 126 C
24 27 127 C
25 28 128 C
26 29 129 C
27 30 130 C

What am I doing wrong?

979

asked Jul 01 '14 14:07

erc

2 Answers

You can use tail inside do.

df2 <- df %>% filter(x>3) %>% group_by(g) %>%  do(tail(., n=10))

The use of . is key for this to work. From the do help page: "You can use . to refer to the current group."

Edit:

As @beginneR pointed out, I was focusing on how to use tail in groups with dplyr and missed the part of the question where the OP asked for the 10 lowest values of y. To do this correctly would take the addition of arrange. With tail, this would mean arranging by descending order of y.

df2 <- df %>% filter(x>3) %>% group_by(g) %>%  arrange(desc(y)) %>% do(tail(., n=10))

180

answered Oct 22 '22 18:10

aosmith

Here are two other options:

df %>% filter(x>3) %>% group_by(g) %>% top_n(3, desc(y))

Here we make use of top_n but use desc(y) since we want the lowest y values instead of the largest ("top") y values.

df %>% filter(x>3) %>% group_by(g) %>% arrange(y) %>% filter(1:n() <= 10)

which is equal to

df %>% filter(x>3) %>% group_by(g) %>% arrange(y) %>% slice(1:10)

After the grouping, we sort each group by increasing y and then select the first 10 rows per group (or less if there are not 10 rows in a group).

Since there was some confusion about lowest and last values to be selected: this answer selects the lowest values, not the last entries.

answered Oct 22 '22 19:10

talat

Related questions
                            
                                Extract last non-missing value in row with data.table
                            
                                R Plotly Deselect trace by default
                            
                                How to find the three closest (nearest) values within a vector?
                            
                                Saving a data frame as a binary file
                            
                                How to change points and add a regression to a cloudplot (using R)?
                            
                                ggplot2 offset scatterplot points
                            
                                What algorithm I need to find n-grams?
                            
                                Conditional coloring of cells in table
                            
                                Error ".onLoad failed in loadNamespace() for 'tcltk'"
                            
                                Iterating over characters of string R
                            
                                Trying to publish an R notebook and keep getting the same error (Error in contrib.url(repos, "source") trying to use CRAN without setting a mirror
                            
                                Efficiently change elements in data based on neighbouring elements
                            
                                How can I add annotations below the x axis in ggplot2?
                            
                                How to get ranks with no gaps when there are ties among values?
                            
                                How can I read the source code for an R function?
                            
                                creating a triangular matrix
                            
                                Writing the data frame to MySql DB table
                            
                                Random forests in R (empty classes in y and argument legth 0)
                            
                                How to remove specific special characters in R
                            
                                Cannot compile a simple JNI program on Debian Wheezhy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With