Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create N random integers with no gaps

Tags:

r

For a clustering algorithm that I'm implementing, I would like to initialize the clusters assignments at random. However, I need that there are no gaps. That is, this is not ok:

set.seed(2)
K <- 10 # initial number of clusters
N <- 20 # number of data points
z_init <- sample(K,N, replace=TRUE) # initial assignments
z_init
#  [1]  2  8  6  2 10 10  2  9  5  6  6  3  8  2  5  9 10  3  5  1
sort(unique(z_init))
# [1]  1  2  3  5  6  8  9 10

where labels 4 and 7 have not been used.

Instead, I would like this vector to be:

#  [1]  2  6  5  2 8 8  2  7  4  5  5  3  6  2  4  7 8  3  4  1

where the label 5 has become 4 and so forth to fill the lower empty labels.

More examples:

  • The vector 1 2 3 5 6 8 should be ̀1 2 3 4 5 6 7
  • The vector 15,5,7,7,10 should be ̀1 2 3 3 4

Can it be done avoiding for loops? I don't need it to be fast, I prefer it to be elegant and short, since I'm doing it only once in the code (for label initialization).

My solution using a for loop

z_init <- c(3,2,1,3,3,7,9)

idx <- order(z_init)
for (i in 2:length(z_init)){
  if(z_init[idx[i]] > z_init[idx[i-1]]){
    z_init[idx[i]] <- z_init[idx[i-1]]+1
  }
  else{
    z_init[idx[i]] <- z_init[idx[i-1]]  
  }

}

z_init
# 3 2 1 3 3 4 5
like image 204
alberto Avatar asked Feb 01 '16 21:02

alberto


1 Answers

Edit: @GregSnow came up with the current shortest answer. I'm 100% convinced that this is the shortest possible way.

For fun, I decided to golf the code, i.e. write it as short as possible:

z <- c(3, 8, 4, 4, 8, 2, 3, 9, 5, 1, 4)
# solution by hand: 1 2 3 3 4 4 4 5 6 6 7

sort(c(factor(z))) # 18 bits, as proposed by @GregSnow in the comments
# [1] 1 2 3 3 4 4 4 5 6 6 7

Some other (functioning) attempts:

y=table(z);rep(seq(y),y) # 24 bits
sort(unclass(factor(z))) # 24 bits, based on @GregSnow 's answer
diffinv(diff(sort(z))>0)+1 # 26 bits
sort(as.numeric(factor(z))) # 27 bits, @GregSnow 's original answer
rep(seq(unique(z)),table(z)) # 28 bits
cumsum(c(1,diff(sort(z))>0)) # 28 bits
y=rle(sort(z))$l;rep(seq(y),y) # 30 bits

Edit2: Just to show that bits isn't everything:

z <- sample(1:10,10000,replace=T)
Unit: microseconds
                                      expr      min        lq      mean    median        uq      max neval
                        sort(c(factor(z))) 2550.128 2572.2340 2681.4950 2646.6460 2729.7425 3140.288   100
   {     y = table(z)     rep(seq(y), y) } 2436.438 2485.3885 2580.9861 2556.4440 2618.4215 3070.812   100
                  sort(unclass(factor(z))) 2535.127 2578.9450 2654.7463 2623.9470 2708.6230 3167.922   100
            diffinv(diff(sort(z)) > 0) + 1  551.871  572.2000  628.6268  626.0845  666.3495  940.311   100
               sort(as.numeric(factor(z))) 2603.814 2672.3050 2762.2030 2717.5050 2790.7320 3558.336   100
             rep(seq(unique(z)), table(z)) 2541.049 2586.0505 2733.5200 2674.0815 2760.7305 5765.815   100
           cumsum(c(1, diff(sort(z)) > 0))  530.159  545.5545  602.1348  592.3325  632.0060  844.385   100
{  y = rle(sort(z))$l     rep(seq(y), y) }  661.218  684.3115  727.4502  724.1820  758.3280  857.412   100

z <- sample(1:100000,replace=T)
Unit: milliseconds
                                      expr       min        lq     mean    median       uq       max neval
                        sort(c(factor(z))) 84.501189 87.227377 92.13182 89.733291 94.16700 150.08327   100
   {     y = table(z)     rep(seq(y), y) } 78.951701 82.102845 85.54975 83.935108 87.70365 106.05766   100
                  sort(unclass(factor(z))) 84.958711 87.273366 90.84612 89.317415 91.85155 121.99082   100
            diffinv(diff(sort(z)) > 0) + 1  9.784041  9.963853 10.37807 10.090965 10.34381  17.26034   100
               sort(as.numeric(factor(z))) 85.917969 88.660145 93.42664 91.542263 95.53720 118.44512   100
             rep(seq(unique(z)), table(z)) 86.568528 88.300325 93.01369 90.577281 94.74137 118.03852   100
           cumsum(c(1, diff(sort(z)) > 0))  9.680615  9.834175 10.11518  9.963261 10.16735  14.40427   100
 { y = rle(sort(z))$l     rep(seq(y), y) } 12.842614 13.033085 14.73063 13.294019 13.66371 133.16243   100
like image 159
slamballais Avatar answered Sep 28 '22 01:09

slamballais