I have a lot of units that are measured repeated times.
>df
Item value year
1 20 1990
1 20 1991
2 30 1990
2 15 1990
2 5 1991
3 10 1991
4 15 1990
5 10 1991
5 5 1991
I am trying to use dplyr
to remove values that have a low number of observations. On this toy data, lets say that I want to remove data which has fewer than 2 counts.
>df <- df %>%
group_by(Item) %>%
tally() %>%
filter(n>1)
Item n
1 2
2 3
5 2
The problem is that I would like to expand this back to what it was, but with this filter. I attempted using the ungroup
command, but that seems to only have an effect when grouping by two variables. How can I filter by item counts then get my original variables back i.e value
and year
. It should look like this:
>df
Item value year
1 20 1990
1 20 1991
2 30 1990
2 15 1990
2 5 1991
5 10 1991
5 5 1991
Running ungroup() will drop any grouping. This can be reinstated again with regroup().
Split vector and data frame in R, splitting data into groups depending on factor levels can be done with R's split() function. Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters.
The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.
The function groups a selected set of rows into a set of summary rows by the values of one or more groupBy_columnName columns. One row is returned for each group. GROUPBY is primarily used to perform aggregations over intermediate results from DAX table expressions.
More simply, use dplyr's row_number()
library(dplyr)
df <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)
df %>%
group_by(Item) %>%
filter(max(row_number()) > 1) %>%
ungroup()
# A tibble: 7 x 3
# Groups: Item [3]
Item value year
<int> <int> <int>
1 1 20 1990
2 1 20 1991
3 2 30 1990
4 2 15 1990
5 2 5 1991
6 5 10 1991
7 5 5 1991
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With