Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using R - delete rows when a value repeated less than 3 times

Tags:

dataframe

r

row

frame with 10 rows and 3 columns

    a   b c
1   1 201 1
2   2 202 1
3   3 203 1
4   4 204 1
5   5 205 4
6   6 206 5
7   7 207 4
8   8 208 4
9   9 209 8
10 10 210 5

I want to delete all rows where the same value in the column "c" repeated less than 3 times. In this example I want to remove rows 6, 9 and 10. (my real data.frame has 5000 rows and 25 cols) I tried to do it using the function rle, but I keep getting the wrong solution. any help? thanks!

like image 512
Claudia Avatar asked Oct 12 '10 21:10

Claudia


People also ask

How do I remove rows based on R conditions?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).

How do you subset rows in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do I remove rows from NA in R?

By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.


2 Answers

Here is a solution using ave :

Data[ave(Data$c, Data$c, FUN = length) > 2, ]

or using ave with subset:

subset(Data, ave(c, c, FUN = length) > 2)
like image 196
G. Grothendieck Avatar answered Oct 07 '22 01:10

G. Grothendieck


Building on Joshua's answer:

Data[Data$c %in% names(which(table(Data$c) > 2)), ]
like image 38
Erik Iverson Avatar answered Oct 06 '22 23:10

Erik Iverson