I am trying to figure out how to remove all unique rows, from a data frame, but if it has a duplicate, I want that to stay in. For Example - I want all columns from this with col1 the same:
df<-data.frame(col1=c(rep("a",3),"b","c",rep("d",3)),col2=c("A","B","C",rep("A",3),"B","C"),col3=c(3,3,1,4,4,3,2,1))
df
col1 col2 col3
1 a A 3
2 a B 3
3 a C 1
4 b A 4
5 c A 4
6 d A 3
7 d B 2
8 d C 1
subset(df,duplicated(col1))
col1 col2 col3
2 a B 3
3 a C 1
7 d B 2
8 d C 1
But I want to have rows 1,2,3,6,7,8 since they all have the same col 1. How do I get 1 and 6 to be included? Or, conversely, how do I remove rows that do not have a duplicate?
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
Ctrl + Shift + L in Excel 2013, or under the Data menu. Then click the filter drop-down in the new TRUE/FALSE column and uncheck "FALSE" to show only uniques and click OK. Then Select the visible rows and delete those rows (right-click any row -> delete row).
Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.
Another option:
subset(df,duplicated(col1) | duplicated(col1, fromLast=TRUE))
Try:
> tdf <- table(df$col1)
a b c d
3 1 1 3
df[df$col1 %in% names(tdf)[tdf>1],]
> df
col1 col2 col3
1 a A 3
2 a B 3
3 a C 1
6 d A 3
7 d B 2
8 d C 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With