I am looking to do a 4 day rolling average over a large set of data. The problem is that some individuals do not have 4 cases and thus I get an error indicating that k <= n is not TRUE. Is there a way to remove any individual that does not have enough data in the data set? Here is an example of how the data would look: <pre class="prettyprint lang-none prettyprint-override"><code> Name variable.1 1 Kim 64.703950 2 Kim 926.339849 3 Kim 128.662977 4 Kim 290.888594 5 Kim 869.418523 6 Bob 594.973849 7 Bob 408.159544 8 Bob 609.140928 9 Joseph 496.779712 10 Joseph 444.028668 11 Joseph -213.375635 12 Joseph -76.728981 13 Joseph 265.642784 14 Hank -91.646728 15 Hank 170.209746 16 Hank 97.889889 17 Hank 12.069074 18 Hank 402.361731 19 Earl 721.941796 20 Earl 4.823148 21 Earl 696.299627 </code></pre>

If your data frame is <code>df</code>, you can remove all names that occur fewer than 4 times with <code>dplyr</code>: <pre class="prettyprint"><code>library(dplyr) df %>% group_by(Name) %>% filter(n() >= 4) </code></pre>

Removing Rows Based on Not Enough Repeated Data in a Large Data Set in R

Q: How do I remove rows based on R conditions?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).

Q: How do you omit certain values in R?

omit() function in R Language is used to omit all unnecessary cases from data frame, matrix or vector. Parameter: data: Set of specified values of data frame, matrix or vector.

Q: How do I remove values greater than in R?

To delete a row from an R data frame if any value in the row is greater than n can be done by using the subsetting with single square brackets and negation operator.

Q: How do I remove a row from a null value in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

Here is an example of how the data would look:

     Name  variable.1
1     Kim   64.703950
2     Kim  926.339849
3     Kim  128.662977
4     Kim  290.888594
5     Kim  869.418523
6     Bob  594.973849
7     Bob  408.159544
8     Bob  609.140928
9  Joseph  496.779712
10 Joseph  444.028668
11 Joseph -213.375635
12 Joseph  -76.728981
13 Joseph  265.642784
14   Hank  -91.646728
15   Hank  170.209746
16   Hank   97.889889
17   Hank   12.069074
18   Hank  402.361731
19   Earl  721.941796
20   Earl    4.823148
21   Earl  696.299627

534

asked May 06 '15 19:05

user3585829

1 Answers

If your data frame is df, you can remove all names that occur fewer than 4 times with dplyr:

library(dplyr)

df %>%
  group_by(Name) %>%
  filter(n() >= 4)

108

answered Sep 23 '22 04:09

davechilders

Related questions
                            
                                tbl_df and data.frame difference when using loops
                            
                                Weird lines appearing in the R graph
                            
                                Separate a column into multiple columns using tidyr::separate with sep=""
                            
                                How to drop columns in a nested data frame in R?
                            
                                Multiple series barplot
                            
                                Which selector to write in rvest package in R?
                            
                                R data.table replace NA with mean for numeric columns and most frequent value for nominal values
                            
                                Doing absolute descending sort of data.table through function?
                            
                                Efficient calling of F95 in R: use .Fortran or .Call?
                            
                                How to calculate dynamic panel models with lfe package
                            
                                Compiling RMarkdown with RStudio: why reading .RProfile?
                            
                                Count based on multiple conditions from other data.frame
                            
                                how to automatically update a slot of S4 class in R
                            
                                Subset n number of rows from a dataframe, based on a categorical variable, in R
                            
                                Icons as x-axis labels in R
                            
                                fit 2d surface using LOESS in R
                            
                                Building R packages with Packrat and AppVeyor
                            
                                Adding point and lines to 3D scatter plot in R
                            
                                R summarize unique values across columns based on values from one column
                            
                                How can I get a percentile value for each dataframe row considering a subset of the data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing Rows Based on Not Enough Repeated Data in a Large Data Set in R

Tags:

dataframe

r

dataset

user3585829

People also ask

1 Answers

davechilders

Recent Activity

Donate For Us