Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate auto-incrementing ID in R

Tags:

r

I am looking for an efficient way to create unique, numeric IDs for some synthetic data I'm generating.

Right now, I simply have a function that emits and increments a value from a global variable (see demo code below). However, this is messy because I must init the idCounter variable and I'd rather not use global variables if possible.

# Emit SSN
idCounter = 0
emitID = function(){
  # Turn into a formatted string
  id = formatC(idCounter,width=9,flag=0,format="d")

  # Increment id counter
  idCounter <<- idCounter+1

  return(id)
}
record$id = emitID()

The uuid package provides functionality close to what I want, but I need IDs to be integers only. Any suggestions? Perhaps a way to convert the UUID value into a numeric value of some sort? Obviously some collisions would occur but that'd probably be ok. I think, at most, I'd need 1 billion values.

Thanks for any suggestions!

-Rob

like image 779
Rob Avatar asked Sep 18 '14 00:09

Rob


People also ask

How to auto increment?

The MS SQL Server uses the IDENTITY keyword to perform an auto-increment feature. In the example above, the starting value for IDENTITY is 1, and it will increment by 1 for each new record. Tip: To specify that the "Personid" column should start at value 10 and increment by 5, change it to IDENTITY(10,5) .

How to set auto increment ID in MySQL?

You can set the MySQL Auto Increment Primary Key field via the following syntax: CREATE TABLE table_name ( column1 datatype NOT NULL AUTO_INCREMENT, column2 datatype [ NULL | NOT NULL ], ... );

How to set auto increment after creating table in MySQL?

In MySQL, the syntax to change the starting value for an AUTO_INCREMENT column using the ALTER TABLE statement is: ALTER TABLE table_name AUTO_INCREMENT = start_value; table_name.

How does auto increment work in MySQL?

Auto Increment is a function that operates on numeric data types. It automatically generates sequential numeric values every time that a record is inserted into a table for a field defined as auto increment.


2 Answers

A non-global version of the counter uses lexical scope to encapsulate idCounter with the increment function

emitID <- local({
    idCounter <- -1L
    function(){
        idCounter <<- idCounter + 1L                     # increment
        formatC(idCounter, width=9, flag=0, format="d")  # format & return
    }
})

and then

> emitID()
[1] "000000000"
> emitID1()
[1] "000000001"
> idCounter <- 123   ## global variable, not locally scoped idCounter
> emitID()
[1] "000000002"

A fun alternative is to use a 'factory' pattern to create independent counters. Your question implies that you'll call this function a billion (hmm, not sure where I got that impression...) times, so maybe it makes sense to vectorize the call to formatC by creating a buffer of ids?

idFactory <- function(buf_n=1000000) {
    curr <- 0L
    last <- -1L
    val <- NULL
    function() {
        if ((curr %% buf_n) == 0L) {
            val <<- formatC(last + seq_len(buf_n), width=9, flag=0, format="d")
            last <<- last + buf_n
            curr <<- 0L
        }
        val[curr <<- curr + 1L]
    }
}
emitID2 <- idFactory()

and then (emitID1 is an instance of the local variable version above).

> library(microbenchmark)
> microbenchmark(emitID1(), emitID2(), times=100000)
Unit: microseconds
      expr    min     lq median     uq      max neval
 emitID1() 66.363 70.614 72.310 73.603 13753.96 1e+05
 emitID2()  2.240  2.982  4.138  4.676 49593.03 1e+05
> emitID1()
[1] "000100000"
> emitID2()
[1] "000100000"

(the proto solution is about 3x slower than emitID1, though speed is not everything).

like image 163
Martin Morgan Avatar answered Oct 21 '22 14:10

Martin Morgan


I like to use the proto package for small OO programming. Under the hood, it uses environments in a similar fashion to what Martin Morgan illustrated.

# this defines your class
library(proto)
Counter <- proto(idCounter = 0L)
Counter$emitID <- function(self = .) {
   id <- formatC(self$idCounter, width = 9, flag = 0, format = "d")
   self$idCounter <- self$idCounter + 1L
   return(id)
}

# This creates an instance (or you can use `Counter` directly as a singleton)
mycounter <- Counter$proto()

# use it:
mycounter$emitID()
# [1] "000000000"
mycounter$emitID()
# [1] "000000001"
like image 31
flodel Avatar answered Oct 21 '22 15:10

flodel