Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all records which have multiple values in a column in R

For a sample dataframe:

df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7", 
       "a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy", 
       "ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code", 
       "name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
       -8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character", 
       "collector")), name = structure(list(), class = c("collector_character", 
       "collector")), number = structure(list(), class = c("collector_double", 
       "collector"))), .Names = c("code", "name", "number")), default = structure(list(), class = c("collector_guess", 
       "collector"))), .Names = c("cols", "default"), class = "col_spec"))

I want to highlight all the records which are have two or more values of 'code' which are the same. I know I could use:

df[duplicated(df$name), ]

But this only highlights the duplicated records, but I want all of the code values which are duplicated (i.e. 3 a1s and 2 f5s).

Any ideas?

like image 644
KT_1 Avatar asked Jun 21 '18 08:06

KT_1


People also ask

How do I filter multiple values in a column in R?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

How do I find common values in multiple columns in R?

To find the common elements between two columns of an R data frame, we can use intersect function.

How do I filter a column in R?

The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values.


1 Answers

df[duplicated(df$code) | duplicated(df$code, fromLast=TRUE), ]
  code  name number
1   a1 katie    3.5
2   a1 katie    3.5
5   f5   amy    4.0
6   f5   amy    4.0
8   a1 james    3.0

Another solution inspired by Alok VS:

ta <- table(df$code)
df[df$code %in% names(ta)[ta > 1], ]

Edit: If you are ok with leaving base R then gdata::duplicated2() allows for more concision.

library(gdata)
df[duplicated2(df$code), ]
like image 191
sindri_baldur Avatar answered Oct 31 '22 11:10

sindri_baldur