Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Ungroup after grouping by just one variable in dplyr




I have a lot of units that are measured repeated times.

Item value  year
1     20     1990
1     20     1991
2     30     1990
2     15     1990
2     5      1991
3     10     1991
4     15     1990
5     10     1991
5      5     1991

I am trying to use dplyr to remove values that have a low number of observations. On this toy data, lets say that I want to remove data which has fewer than 2 counts.

>df <- df %>% 
  group_by(Item) %>% 
  tally() %>% 

Item  n
1     2
2     3
5     2

The problem is that I would like to expand this back to what it was, but with this filter. I attempted using the ungroup command, but that seems to only have an effect when grouping by two variables. How can I filter by item counts then get my original variables back i.e value and year. It should look like this:

Item value  year
1     20     1990
1     20     1991
2     30     1990
2     15     1990
2     5      1991
5     10     1991
5      5     1991
like image 336
Alex Avatar asked Jul 28 '17 08:07


People also ask

What does ungroup () do in R?

Running ungroup() will drop any grouping. This can be reinstated again with regroup().

How do I separate a group of data in R?

Split vector and data frame in R, splitting data into groups depending on factor levels can be done with R's split() function. Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters.

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

What is the use of the Group_by function select one?

The function groups a selected set of rows into a set of summary rows by the values of one or more groupBy_columnName columns. One row is returned for each group. GROUPBY is primarily used to perform aggregations over intermediate results from DAX table expressions.

1 Answers

More simply, use dplyr's row_number()


df <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)

df %>% 
  group_by(Item) %>% 
  filter(max(row_number()) > 1) %>%

# A tibble: 7 x 3
# Groups:   Item [3]
   Item value  year
  <int> <int> <int>
1     1    20  1990
2     1    20  1991
3     2    30  1990
4     2    15  1990
5     2     5  1991
6     5    10  1991
7     5     5  1991
like image 159
r.bot Avatar answered Sep 19 '22 17:09
