Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R code to generate unique ID with prefix? [duplicate]

Tags:

r

I have a table called "pipel" that contains more than 10,000 rows. I would like to add an ID column to assign a unique ID for each row. The unique ID must be 30 digits long and starts with "AKM_CC_Test_". I used the code below as a starting point but not sure how to format it to add the prefix and make it 30 digits long.

id <- rownames(pipel)
pipel <- cbind(id=id, pipel)

For example first row ID will need to look like this AKM_CC_Test_000000000000000001

like image 391
Curious Avatar asked Feb 05 '23 23:02

Curious


2 Answers

You could use sprintf(). This creates a 30 character string beginning with "AKM_CC_Test_" and ending in a sequence of 1:nrow(pipel) with leading zeros.

x <- "AKM_CC_Test_"
sprintf("%s%0*d", x, 30 - nchar(x), 1:nrow(pipel))
  • %s inserts x into the string
  • %0*d adds 1:nrow(pipel) with * leading zeros, after x. The * is used to insert 30 - nchar(x) into the format (I did it programatically; you could just insert 18 there if you want)

An example on a simple length 5 (1:5) vector would be

x <- "AKM_CC_Test_"
sprintf("%s%0*d", x, 30 - nchar(x), 1:5)
# [1] "AKM_CC_Test_000000000000000001" "AKM_CC_Test_000000000000000002"
# [3] "AKM_CC_Test_000000000000000003" "AKM_CC_Test_000000000000000004"
# [5] "AKM_CC_Test_000000000000000005"
like image 117
Rich Scriven Avatar answered Feb 08 '23 12:02

Rich Scriven


You can use : or seq for sequences and you can prepend your leading text with paste or paste0. The heart of the question is on the number padding with leading 0's.

Your options are:

  1. stri_pad from stringi (more intuitive)
  2. str_pad from stringr (more intuitive)
  3. sprintf (no packages needed)
  4. formatC (good if you're familiar with C's printf)

Note that some cases, though not this particular one, necessitate disabling scientific notation for numbers in the sequence. This can be done with options or with_options from devtools.

Please see this popular post for examples of each.

Using formatC:

uid <- paste0("AKM_CC_Test_", formatC(1:10000, width = 18, format = "d", flag = "0"))
head(uid)
[1] "AKM_CC_Test_000000000000000001" "AKM_CC_Test_000000000000000002" "AKM_CC_Test_000000000000000003" "AKM_CC_Test_000000000000000004"
[5] "AKM_CC_Test_000000000000000005" "AKM_CC_Test_000000000000000006"

Using the stringr package:

uid <- paste0("AKM_CC_Test_", str_pad(1:10000, 18, pad = "0")) # remember to load stringr
head(uid)
[1] "AKM_CC_Test_000000000000000001" "AKM_CC_Test_000000000000000002" "AKM_CC_Test_000000000000000003" "AKM_CC_Test_000000000000000004"
[5] "AKM_CC_Test_000000000000000005" "AKM_CC_Test_000000000000000006"

Using sprintf:

head(sprintf("%s%0*d", "AKM_CC_Test_", 18,  1:10000))
[1] "AKM_CC_Test_000000000000000001" "AKM_CC_Test_000000000000000002" "AKM_CC_Test_000000000000000003" "AKM_CC_Test_000000000000000004"
[5] "AKM_CC_Test_000000000000000005" "AKM_CC_Test_000000000000000006"

Using stri_pad from the package stringi:

uid <- paste0("AKM_CC_Test_", stri_pad(1:10000, 18, pad = "0")) # remember to load stringi
head(uid)
[1] "AKM_CC_Test_000000000000000001" "AKM_CC_Test_000000000000000002" "AKM_CC_Test_000000000000000003" "AKM_CC_Test_000000000000000004"
[5] "AKM_CC_Test_000000000000000005" "AKM_CC_Test_000000000000000006"
like image 29
Hack-R Avatar answered Feb 08 '23 11:02

Hack-R