Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying Duplicate/Unique Teams (and Restructuring Data) in R

Tags:

r

I have a data set that looks like this:

 Person Team
   1     30
   2     30
   3     30
   4     30
   11    40
   22    40
   1     50
   2     50
   3     50
   4     50
   15    60
   16    60
   17    60
   1     70
   2     70
   3     70
   4     70
   11    80
   22    80

My overall goal is to organize that team identification codes so that it is easy to see which teams are duplicates of one another and which teams are unique. I want to summarize the data so that it looks like this:

 Team   Duplicate1  Duplicate2
  30        50          70
  40        80  
  60        

As you can see, teams 30, 50, and 70 have identical members, so they share a row. Similarly, teams 40 and 80 have identical members, so they share a row. Only team 60 (in this example) is unique.

In situations where teams are duplicated, I don't care which team id goes in which column. Also, there may be more than 2 duplicates of a team. Teams range in size from 2 members to 8 members.

like image 474
waxattax Avatar asked Dec 19 '22 08:12

waxattax


1 Answers

This answer gives the output data format you asked for. I left the duplicate teams in a single variable because I think it's a better way to handle an arbitrary number of duplicates.

require(dplyr)

df %>%
  arrange(Team, Person) %>%   # this line is necessary in case the rest of your data isn't sorted
  group_by(Team) %>%
  summarize(players = paste0(Person, collapse = ",")) %>%
  group_by(players) %>%
  summarize(teams = paste0(Team, collapse = ",")) %>%
  mutate(
    original_team = ifelse(grepl(",", teams), substr(teams, 1, gregexpr(",", teams)[[1]][1]-1), teams),
    dup_teams = ifelse(grepl(",", teams), substr(teams, gregexpr(",", teams)[[1]][1]+1, nchar(teams)), NA)
  )

The result:

Source: local data frame [3 x 4]

   players    teams original_team dup_teams
1  1,2,3,4 30,50,70            30     50,70
2    11,22    40,80            40        80
3 15,16,17       60            60        NA
like image 89
rsoren Avatar answered Mar 06 '23 22:03

rsoren