I have a dataframe (df) that looks like this: <pre class="prettyprint"><code>School Student Year A 10 1999 A 10 2000 A 20 1999 A 20 2000 A 20 2001 B 10 1999 B 10 2000 </code></pre> And I would like to create a person <code>ID</code> column so that df looks like this: <pre class="prettyprint"><code>ID School Student Year 1 A 10 1999 1 A 10 2000 2 A 20 1999 2 A 20 2000 2 A 20 2001 3 B 10 1999 3 B 10 2000 </code></pre> In other words, the <code>ID</code> variable indicates which person it is in the dataset, accounting for both Student number and School membership (here we have 3 students total). I did <code>df$ID <- df$Student</code> and tried to request the value +1 if <code>c("School", "Student)</code> was unique. It isn't working. Help appreciated.

We can do this in <code>base R</code> without doing any group by operation <pre class="prettyprint"><code>df$ID <- cumsum(!duplicated(df[1:2])) df # School Student Year ID #1 A 10 1999 1 #2 A 10 2000 1 #3 A 20 1999 2 #4 A 20 2000 2 #5 A 20 2001 2 #6 B 10 1999 3 #7 B 10 2000 3 </code></pre> NOTE: Assuming that 'School' and 'Student' are ordered <hr> Or using <code>tidyverse</code> <pre class="prettyprint"><code>library(dplyr) df %>% mutate(ID = group_indices_(df, .dots=c("School", "Student"))) # School Student Year ID #1 A 10 1999 1 #2 A 10 2000 1 #3 A 20 1999 2 #4 A 20 2000 2 #5 A 20 2001 2 #6 B 10 1999 3 #7 B 10 2000 3 </code></pre> As @radek mentioned, in the recent version (<code>dplyr_0.8.0</code>), we get the notification that <code>group_indices_</code> is deprecated, instead use <code>group_indices</code> <pre class="prettyprint"><code>df %>% mutate(ID = group_indices(., School, Student)) </code></pre>

Assign unique ID based on two columns [duplicate]

Q: How do I create a unique identifier in Excel with multiple columns?

1. type 1 into the cell which is adjacent to the first data you want to add ID number. 2. Then in the cell below it, type this formula =IF(B1=B2,A1,A1+1), press Enter key to get the first result, drag fill handle down until last data showing up.

Q: What is the unique ID column for?

In a database or spreadsheet, unique identifiers may be designated as a specific column or field to help make sorting and filtering through information easier. This also helps trace information back to a specific user or entity within the system.

Tags:

r

multiple-columns

I have a dataframe (df) that looks like this:

School Student  Year  
A         10    1999
A         10    2000
A         20    1999
A         20    2000
A         20    2001
B         10    1999
B         10    2000

And I would like to create a person ID column so that df looks like this:

ID School Student  Year  
1   A         10    1999
1   A         10    2000
2   A         20    1999
2   A         20    2000
2   A         20    2001
3   B         10    1999
3   B         10    2000

In other words, the ID variable indicates which person it is in the dataset, accounting for both Student number and School membership (here we have 3 students total).

I did df$ID <- df$Student and tried to request the value +1 if c("School", "Student) was unique. It isn't working. Help appreciated.

708

asked Mar 21 '17 08:03

iPlexpen

2 Answers

We can do this in base R without doing any group by operation

df$ID <- cumsum(!duplicated(df[1:2]))
df
#   School Student Year ID
#1      A      10 1999  1
#2      A      10 2000  1
#3      A      20 1999  2
#4      A      20 2000  2
#5      A      20 2001  2
#6      B      10 1999  3
#7      B      10 2000  3

NOTE: Assuming that 'School' and 'Student' are ordered

Or using tidyverse

library(dplyr)
df %>% 
    mutate(ID = group_indices_(df, .dots=c("School", "Student"))) 
#  School Student Year ID
#1      A      10 1999  1
#2      A      10 2000  1
#3      A      20 1999  2
#4      A      20 2000  2
#5      A      20 2001  2
#6      B      10 1999  3
#7      B      10 2000  3

As @radek mentioned, in the recent version (dplyr_0.8.0), we get the notification that group_indices_ is deprecated, instead use group_indices

df %>% 
   mutate(ID = group_indices(., School, Student))

answered Oct 03 '22 23:10

akrun

Group by School and Student, then assign group id to ID variable.

library('data.table')
df[, ID := .GRP, by = .(School, Student)]

#    School Student Year ID
# 1:      A      10 1999  1
# 2:      A      10 2000  1
# 3:      A      20 1999  2
# 4:      A      20 2000  2
# 5:      A      20 2001  2
# 6:      B      10 1999  3
# 7:      B      10 2000  3

Data:

df <- fread('School Student  Year  
A         10    1999
      A         10    2000
      A         20    1999
      A         20    2000
      A         20    2001
      B         10    1999
      B         10    2000')

answered Oct 04 '22 00:10

Sathish

Related questions
                            
                                Make scale_y_log10 to have the tickmarks at 0.01,0.1,1
                            
                                subset rows with (1) ALL and (2) ANY columns larger than a specific value
                            
                                Is it possible to specify command line parameters to R-script in RStudio?
                            
                                Replicating a dataframe as a whole n times
                            
                                Read FASTA into a dataframe and extract subsequences of FASTA file
                            
                                How to append a whole dataframe to a CSV in R
                            
                                Shifting a column down by one
                            
                                Adding a column with consecutive numbers in R
                            
                                Convert sequence of longitude and latitude to polygon via sf in R
                            
                                How to print 1000 decimals places of pi value?
                            
                                Ping a website in R
                            
                                RCurl: HTTP Authentication When Site Responds With HTTP 401 Code Without WWW-Authenticate
                            
                                R foreach with .combine=rbindlist
                            
                                bigrams instead of single words in termdocument matrix using R and Rweka
                            
                                R error in glmnet: NA/NaN/Inf in foreign function call
                            
                                R/regex with stringi/ICU: why is a '+' considered a non-[:punct:] character?
                            
                                Split character column into several binary (0/1) columns
                            
                                Display only months in dateRangeInput or dateInput for a shiny app [R programming]
                            
                                Add sheet to Excel file
                            
                                Randomly sample groups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With