If I want to sample numbers to create a vector I do: <pre class="prettyprint"><code>set.seed(123) x <- sample(1:100,200, replace = TRUE) sum(x) # [1] 10228 </code></pre> What if I want to sample 20 random numbers that sum to 100, and then 30 numbers but still sum to 100. This I imagine will be more of a challenge than it seems. <code>?sample</code> and searching Google has not provided me with a clue. And a loop to sample then reject if not close enough( e.g. within 5) of the desired sum I guess may take some time. Is there a better way to achieve this? an example would be: <pre class="prettyprint"><code>foo(10,100) # ten random numbers that sum to 100. (not including zeros) # 10,10,20,7,8,9,4,10,2,20 </code></pre>

An attempt using R <pre class="prettyprint"><code># Config n <- 20L target <- 100L vec <- seq(100) set.seed(123) # R repeat loop sumto_repeat <- function(vec,n,target) { res <- integer() repeat { cat("begin:",sum(res),length(res),"\n") res <- c( res, sample(vec,1) ) if( sum(res)<target & length(res)==(n-1) ) { res[length(res)+1] <- target - sum(res) } # cat("mid:",sum(res),length(res),"\n") if(sum(res)>target) res <- res[-length(res)] if( length(res)>n | length(res)<n & sum(res)==target ) { res <- res[-sample(seq(length(res)),1)] } # cat("end:",sum(res),length(res),"\n") # cat(dput(res),"\n") if( sum(res)==target & length(res)==n ) break } res } test <- sumto_repeat(vec=vec,n=n,target=target) > sum(test) [1] 100 > length(test) [1] 20 </code></pre> Also, I'd give some thought to what distribution you'd like to be drawing from. I think that there are a few different ways of getting it to sum to exactly <code>target</code> with <code>n</code> elements (for instance, you could make the last element always be <code>target - sum(res)</code>) that may or may not have different distributional implications. A very similar algorithm in Rcpp, for speeeeed! <pre class="prettyprint"><code>cpp_src <- ' Rcpp::IntegerVector xa = clone(x); // Vector to be sampled Rcpp::IntegerVector na(n); // Number of elements in solution Rcpp::IntegerVector sa(s); // Sum of solution int nsampled; int currentSum; int dropRandomIndex; int numZeroes; Rcpp::IntegerVector remainingQuantity(1); int maxAttempts = 100; // Create container for our results Rcpp::IntegerVector res(maxAttempts); std::fill( res.begin(), res.end(), NA_INTEGER ); // Calculate min/max so that we can draw random integers from within range Rcpp::IntegerVector::iterator mn = std::min_element(xa.begin(), xa.end()) ; Rcpp::IntegerVector::iterator mx = std::max_element(xa.begin(), xa.end()) ; std::cout << "mx = " << *mx << std::endl; // Now draw repeatedly nsampled = 0; for( int i = 0; i < maxAttempts; i++ ) { std::cout << "\\n" << i; int r = *mn + (rand() % (int)(*mx - *mn + 1)); res[i] = xa[r+1]; // Calculate n and s for current loop iteration numZeroes = 0; for( int j = 0; j < maxAttempts; j++) if(res[j]==0) numZeroes++; std::cout << " nz= " << numZeroes ; nsampled = maxAttempts - sum( is_na(res) ) - numZeroes - 1; currentSum = std::accumulate(res.begin(),res.begin()+i,0); // Cant just use Rcpp sugar sum() here because it freaks at the NAs std::cout << " nsamp= " << nsampled << " sum= " << currentSum; if(nsampled == na[0]-1) { std::cout << " One element away. "; remainingQuantity[0] = sa[0] - currentSum; std::cout << "remainingQuantity = " << remainingQuantity[0]; if( (remainingQuantity[0] > 0) && (remainingQuantity[0]) < *mx ) { std::cout << "Within range. Prepare the secret (cheating) weapon!\\n"; std::cout << sa[0] << " "; std::cout << currentSum << " "; std::cout << remainingQuantity[0] << std::endl; if( i != maxAttempts ) { std::cout << "Safe to add one last element on the end. Doing so.\\n"; res[i] = remainingQuantity[0]; } currentSum = sa[0]; nsampled++; if(nsampled == na[0] && currentSum == sa[0]) std::cout << "It should end after this...nsamp= " << nsampled << " and currentSum= " << currentSum << std::endl; break; } else { std::cout << "Out of striking distance. Dropping random element\\n"; dropRandomIndex = 0 + (rand() % (int)(i - 0 + 1)); res[dropRandomIndex] = 0; } } if(nsampled == na[0] && currentSum == sa[0]) { std::cout << "Success!\\n"; for(int l = 0; l <= i+1; l++) std::cout << res[l] << " " ; break; } if(nsampled == na[0] && currentSum != sa[0]) { std::cout << "Reached number of elements but sum is "; if(currentSum > sa[0]) { std::cout << "Too high. Blitz everything and start over!\\n"; for(int k = 0; k < res.size(); k++) { res[k] = NA_INTEGER; } } else { std::cout << "Too low. \\n"; } } if( nsampled < na[0] && currentSum >= sa[0] ) { std::cout << "Too few elements but at or above the sum cutoff. Dropping a random element and trying again.\\n"; dropRandomIndex = 0 + (rand() % (int)(i - 0 + 1)); res[dropRandomIndex] = 0; } } return res; ' sumto <- cxxfunction( signature(x="integer", n="integer", s="integer"), body=cpp_src, plugin="Rcpp", verbose=TRUE ) testresult <- sumto(x=x, n=20L, s=1000L) testresult <- testresult[!is.na(testresult)] testresult <- testresult[testresult!=0] testresult cumsum(testresult) length(testresult) </code></pre> Tried it with a few different values, and produces valid answers unless it runs away. There's a caveat here, which is that it cheats if it's one away from the desired number of elements and within "striking distance" -- e.g. rather than just drawing the last value it calculates it if that number is valid. Benchmarks See gist for comparison code. <img src="https://i.stack.imgur.com/0HVUS.png" alt="benchmarks">

Sample with a max

Tags:

r

sampling

If I want to sample numbers to create a vector I do:

set.seed(123)
x <- sample(1:100,200, replace = TRUE)
sum(x)
# [1] 10228

What if I want to sample 20 random numbers that sum to 100, and then 30 numbers but still sum to 100. This I imagine will be more of a challenge than it seems. ?sample and searching Google has not provided me with a clue. And a loop to sample then reject if not close enough( e.g. within 5) of the desired sum I guess may take some time.

Is there a better way to achieve this?

an example would be:

foo(10,100) # ten random numbers that sum to 100. (not including zeros)
# 10,10,20,7,8,9,4,10,2,20

749

asked Feb 04 '13 10:02

user1320502

1 Answers

An attempt using R

# Config
n <- 20L
target <- 100L
vec <- seq(100)
set.seed(123)

# R repeat loop
sumto_repeat <- function(vec,n,target) {
  res <- integer()
  repeat {
    cat("begin:",sum(res),length(res),"\n")
    res <- c( res, sample(vec,1) )
    if( sum(res)<target & length(res)==(n-1) ) {
      res[length(res)+1] <- target - sum(res)
    }
    # cat("mid:",sum(res),length(res),"\n")
    if(sum(res)>target) res <- res[-length(res)]
    if( length(res)>n | length(res)<n & sum(res)==target ) {
      res <- res[-sample(seq(length(res)),1)]
    }
    # cat("end:",sum(res),length(res),"\n")
    # cat(dput(res),"\n")
    if( sum(res)==target & length(res)==n ) break
  }
  res
}

test <- sumto_repeat(vec=vec,n=n,target=target)
> sum(test)
[1] 100
> length(test)
[1] 20

Also, I'd give some thought to what distribution you'd like to be drawing from. I think that there are a few different ways of getting it to sum to exactly target with n elements (for instance, you could make the last element always be target - sum(res)) that may or may not have different distributional implications.

A very similar algorithm in Rcpp, for speeeeed!

cpp_src <- '
Rcpp::IntegerVector xa = clone(x); // Vector to be sampled
Rcpp::IntegerVector na(n); // Number of elements in solution
Rcpp::IntegerVector sa(s); // Sum of solution

int nsampled;
int currentSum;
int dropRandomIndex;
int numZeroes;
Rcpp::IntegerVector remainingQuantity(1);
int maxAttempts = 100;

// Create container for our results
Rcpp::IntegerVector res(maxAttempts);
std::fill( res.begin(), res.end(), NA_INTEGER );

// Calculate min/max so that we can draw random integers from within range
Rcpp::IntegerVector::iterator mn = std::min_element(xa.begin(), xa.end()) ;
Rcpp::IntegerVector::iterator mx = std::max_element(xa.begin(), xa.end()) ;
std::cout << "mx = " << *mx << std::endl;

// Now draw repeatedly
nsampled = 0;
for( int i = 0; i < maxAttempts; i++ ) {
  std::cout << "\\n" << i;
  int r = *mn + (rand() % (int)(*mx - *mn + 1));
  res[i] = xa[r+1];
  // Calculate n and s for current loop iteration
  numZeroes = 0;
  for( int j = 0; j < maxAttempts; j++) 
    if(res[j]==0) numZeroes++;
  std::cout << " nz= " << numZeroes ;
  nsampled = maxAttempts - sum( is_na(res) ) - numZeroes - 1;
  currentSum = std::accumulate(res.begin(),res.begin()+i,0); // Cant just use Rcpp sugar sum() here because it freaks at the NAs
  std::cout << " nsamp= " << nsampled << " sum= " << currentSum;
  if(nsampled == na[0]-1) {  
    std::cout << " One element away. ";
    remainingQuantity[0] = sa[0] - currentSum;
    std::cout << "remainingQuantity = " << remainingQuantity[0];
    if( (remainingQuantity[0] > 0) && (remainingQuantity[0]) < *mx ) {
      std::cout << "Within range.  Prepare the secret (cheating) weapon!\\n";
      std::cout << sa[0] << " ";
      std::cout << currentSum << " ";
      std::cout << remainingQuantity[0] << std::endl;
      if( i != maxAttempts ) {
        std::cout << "Safe to add one last element on the end.  Doing so.\\n";
        res[i] = remainingQuantity[0];
      }
      currentSum = sa[0];
      nsampled++;
      if(nsampled == na[0] && currentSum == sa[0]) std::cout << "It should end after this...nsamp= " << nsampled << " and currentSum= " << currentSum << std::endl;
      break;
    } else {
      std::cout << "Out of striking distance.  Dropping random element\\n";
      dropRandomIndex = 0 + (rand() % (int)(i - 0 + 1));
      res[dropRandomIndex] = 0;
    }
  }
  if(nsampled == na[0] && currentSum == sa[0]) {
      std::cout << "Success!\\n";
      for(int l = 0; l <= i+1; l++) 
        std::cout << res[l] << " " ;
      break;
  }
  if(nsampled == na[0] && currentSum != sa[0]) {
    std::cout << "Reached number of elements but sum is ";
    if(currentSum > sa[0]) {
      std::cout << "Too high. Blitz everything and start over!\\n";
      for(int k = 0; k < res.size(); k++) {
        res[k] = NA_INTEGER;
      }
    } else {
      std::cout << "Too low.  \\n";

    }
  }
  if( nsampled < na[0] && currentSum >= sa[0] ) {
    std::cout << "Too few elements but at or above the sum cutoff.  Dropping a random element and trying again.\\n";
    dropRandomIndex = 0 + (rand() % (int)(i - 0 + 1));
    res[dropRandomIndex] = 0;
  }
}
return res;
'

sumto <- cxxfunction( signature(x="integer", n="integer", s="integer"), body=cpp_src, plugin="Rcpp", verbose=TRUE )

testresult <- sumto(x=x, n=20L, s=1000L)
testresult <- testresult[!is.na(testresult)]
testresult <- testresult[testresult!=0]
testresult
cumsum(testresult)
length(testresult)

Tried it with a few different values, and produces valid answers unless it runs away. There's a caveat here, which is that it cheats if it's one away from the desired number of elements and within "striking distance" -- e.g. rather than just drawing the last value it calculates it if that number is valid.

Benchmarks

See gist for comparison code.

benchmarks

answered Sep 28 '22 00:09

Ari B. Friedman

Related questions
                            
                                Using R to honor correlations for LatinHypercube / Monte Carlo trials
                            
                                ggplot2 geom_line() should point at specified value
                            
                                Draw grid lines on specific values in xyplot
                            
                                What's the best way to melt a list into a vector?
                            
                                How to convert a tree to a dendrogram in R?
                            
                                How do I read a text file into R when the data is not in a table
                            
                                Count days per year
                            
                                In R, is there a way to handle NA in an integer column of a data.frame so that NA values are not included when subsetting?
                            
                                Date in the form: 20120405
                            
                                Sort one matrix based on another matrix
                            
                                Rename columns of a data frame by searching column name
                            
                                Applying a loess smoothing to a time series
                            
                                Outputting a textplot and qplot in same pdf or png in r
                            
                                Loop over vector containing NULL
                            
                                number of rows each data frame in a list [duplicate]
                            
                                How do I tell R to remove the outlier from a correlation calculation?
                            
                                How to add a color frame to a plot?
                            
                                list.files pattern argument in R, extended regular expression use
                            
                                How to calculate days passed from start date by group?
                            
                                How to work with times, distance and speed?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With