Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disambiguate non-unique elements in a character vector

Tags:

string

r

Given a vector of non-unique patient initials:

init = c("AA", "AB", "AB", "AB", "AC")

Looking for disambiguation as follows:

init1 = c("AA", "AB01", "AB02", "AB03", "AC")

i.e. unique initials should be left unchanged, non-unique are disambiguated by adding two-digit numbers.

like image 745
Dieter Menne Avatar asked Sep 25 '22 13:09

Dieter Menne


1 Answers

Use the indicated function with ave:

uniquify <- function(x) if (length(x) == 1) x else sprintf("%s%02d", x, seq_along(x))
ave(init, init, FUN = uniquify)
## [1] "AA"   "AB01" "AB02" "AB03" "AC"  

If the basic requirement is just to ensure unique output then make.unique(x) or make.unique(x, sep = "0") as discussed by another answer and a comment are concise but if the requirement is that the output be exactly as in the question then they do not give the same result. If there are 10 or more duplicates the output of those answers vary even more; however, the solution here does give the same answer. Here is a further example illustrating 10 or more duplicates.

xx <- rep(c("A", "B", "C"), c(1, 10, 2))
ave(xx, xx, FUN = uniquify)
## [1] "A"   "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "C01" "C02"

The make.unique solution could be rescued like this:

like image 84
G. Grothendieck Avatar answered Nov 15 '22 05:11

G. Grothendieck