Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Rows Based on Not Enough Repeated Data in a Large Data Set in R

I am looking to do a 4 day rolling average over a large set of data. The problem is that some individuals do not have 4 cases and thus I get an error indicating that k <= n is not TRUE.

Is there a way to remove any individual that does not have enough data in the data set?

Here is an example of how the data would look:

     Name  variable.1
1     Kim   64.703950
2     Kim  926.339849
3     Kim  128.662977
4     Kim  290.888594
5     Kim  869.418523
6     Bob  594.973849
7     Bob  408.159544
8     Bob  609.140928
9  Joseph  496.779712
10 Joseph  444.028668
11 Joseph -213.375635
12 Joseph  -76.728981
13 Joseph  265.642784
14   Hank  -91.646728
15   Hank  170.209746
16   Hank   97.889889
17   Hank   12.069074
18   Hank  402.361731
19   Earl  721.941796
20   Earl    4.823148
21   Earl  696.299627
like image 534
user3585829 Avatar asked May 06 '15 19:05

user3585829


People also ask

How do I remove rows based on R conditions?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).

How do you omit certain values in R?

omit() function in R Language is used to omit all unnecessary cases from data frame, matrix or vector. Parameter: data: Set of specified values of data frame, matrix or vector.

How do I remove values greater than in R?

To delete a row from an R data frame if any value in the row is greater than n can be done by using the subsetting with single square brackets and negation operator.

How do I remove a row from a null value in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).


1 Answers

If your data frame is df, you can remove all names that occur fewer than 4 times with dplyr:

library(dplyr)

df %>%
  group_by(Name) %>%
  filter(n() >= 4)
like image 108
davechilders Avatar answered Sep 23 '22 04:09

davechilders