Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Count unique rows for each unique name per year

Tags:

r

This is my first question on stackoverflow. I searched for similar questions but I didn't find an answer.

I know that the question in the title isn't clear but I hope you are going to understand what I want as output.

I have a dataframe that looks like this:

   ID      Name Year
1   1      Anas 2018
2   1      Carl 2018
3   1 Catherine 2018
4   2      Anas 2018
5   2      Carl 2018
6   3 Catherine 2018
7   3    Julien 2018
8   4      Raul 2018
9   4     Ahmed 2018
10  4  Laurence 2018
11  4      Carl 2018
12  5      Anas 2019
13  5   Georges 2019
14  5     Arman 2019
15  6      Anas 2019
16  6    Pietro 2019
17  7    Pietro 2019
18  8     Diego 2019

if the names in the column "Name" have the same ID, it means that are collaborators in a project.

I want to add a column with the number of UNIQUE collaborators per year for each name (by including each name in the count of his collborators)

The output should look like this: (I added the last column to explain how to count-Not needed)

ID      Name Year Unique_Coll                                      explication
1   1      Anas 2018           3                          (Anas, Carl, Catherine)
2   1      Carl 2018           6  (Carle, Anas, Catherine, Laurence, Ahmed, Raul)
3   1 Catherine 2018           4                  (Catherine, Carl, Anas, Julien)
4   2      Anas 2018           3                          (Anas, Carl, Catherine)
5   2      Carl 2018           6  (Carle, Anas, Catherine, Laurence, Ahmed, Raul)
6   3 Catherine 2018           4                  (Catherine, Carl, Anas, Julien)
7   3    Julien 2018           2                              (Julien, Catherine)
8   4      Raul 2018           4                    (Raul, Ahmed, Laurence, Carl)
9   4     Ahmed 2018           4                    (Ahmed, Raul, Laurence, Carl)
10  4  Laurence 2018           4                    (Laurence, Raul, Ahmed, Carl)
11  4      Carl 2018           6  (Carle, Anas, Catherine, Laurence, Ahmed, Raul)
12  5      Anas 2019           4                   (Anas, Georges, Arman, Pietro)
13  5   Georges 2019           3                           (Georges, Anas, Arman)
14  5     Arman 2019           3                            (Arman Anas, Georges)
15  6      Anas 2019           4                   (Anas, Georges, Arman, Pietro)
16  6    Pietro 2019           2                                   (Pietro, Anas)
17  7    Pietro 2019           2                                   (Pietro, Anas)
18  8     Diego 2019           1                                        (Diego)  

Thank you

like image 553
farina27 Avatar asked Nov 30 '25 06:11

farina27


1 Answers

You could construct a variable that would be a list of names and count the number of unique names in the following way:

library(dplyr)
df = df %>% 
group_by(ID) %>% 
mutate(group = list(Name)) %>% 
group_by(Year,Name) %>% 
mutate(n = n_distinct(unlist(list(group)))) %>%
select(-group)

# A tibble: 18 x 4
# Groups:   Year, Name [12]
      ID Name       Year     n
   <int> <chr>     <int> <int>
 1     1 Anas       2018     3
 2     1 Carl       2018     6
 3     1 Catherine  2018     4
 4     2 Anas       2018     3
 5     2 Carl       2018     6
 6     3 Catherine  2018     4
 7     3 Julien     2018     2
 8     4 Raul       2018     4
 9     4 Ahmed      2018     4
10     4 Laurence   2018     4
11     4 Carl       2018     6
12     5 Anas       2019     4
13     5 Georges    2019     3
14     5 Arman      2019     3
15     6 Anas       2019     4
16     6 Pietro     2019     2
17     7 Pietro     2019     2
18     8 Diego      2019     1
like image 104
Lamia Avatar answered Dec 01 '25 21:12

Lamia