I want to sample 140 numbers between 1000 to 100000 such that the sum of these 140 numbers is around 2 million (2000000): <pre class="prettyprint"><code>sample(1000:100000,140) </code></pre> such that: <pre class="prettyprint"><code>sum(sample(1000:100000,140)) = 2000000 </code></pre> Any pointers how I can achieve this?

Here is a hit and miss approach. The basic idea is that finding 140 numbers which sum to 2000000 is the same as breaking 1:2000000 into 140 pieces, which requires 139 cutpoints. Also, note that the minimum of 1000 is somewhat annoying. Just subtract it from all the problem data and add it back in at the end: <pre class="prettyprint"><code>rand.nums <- function(a,b,n,k){ #finds n random integers in range a:b which sum to k while(TRUE){ x <- sample(1:(k - n*a),n-1, replace = TRUE) #cutpoints x <- sort(x) x <- c(x,k-n*a) - c(0,x) if(max(x) <= b-a) return(a+x) } } </code></pre> Then <code>rand.nums(1000,100000,140,2000000)</code> evaluates to 140 integers in the given range which sum to 2000000. For these choices of parameters, the function returns almost instantly. For other choices of the parameters, a solution might be either impossible or so tightly constrained that finding one by chance is effectively impossible. Thus, caution needs to be taken in using the function. It could be modified by adding a <code>maxtrials</code> parameter and returning <code>NA</code> if maxtrials is exceeded without finding a solution.

Random sampling to give an exact sum

Tags:

random

r

sampling

I want to sample 140 numbers between 1000 to 100000 such that the sum of these 140 numbers is around 2 million (2000000):

sample(1000:100000,140)

such that:

sum(sample(1000:100000,140)) = 2000000

Any pointers how I can achieve this?

564

asked Feb 27 '18 18:02

Hardik Gupta

2 Answers

There exist an algorithm for generating such random numbers.

Originally created for MATLAB, there is an R implementation of it:

Surrogate::RandVec

Citation from MATLAB script comment:

%   This generates an n by m array x, each of whose m columns
% contains n random values lying in the interval [a,b], but
% subject to the condition that their sum be equal to s.  The
% scalar value s must accordingly satisfy n*a <= s <= n*b.  The
% distribution of values is uniform in the sense that it has the
% conditional probability distribution of a uniform distribution
% over the whole n-cube, given that the sum of the x's is s.
%
%   The scalar v, if requested, returns with the total
% n-1 dimensional volume (content) of the subset satisfying
% this condition.  Consequently if v, considered as a function
% of s and divided by sqrt(n), is integrated with respect to s
% from s = a to s = b, the result would necessarily be the
% n-dimensional volume of the whole cube, namely (b-a)^n.
%
%   This algorithm does no "rejecting" on the sets of x's it
% obtains.  It is designed to generate only those that satisfy all
% the above conditions and to do so with a uniform distribution.
% It accomplishes this by decomposing the space of all possible x
% sets (columns) into n-1 dimensional simplexes.  (Line segments,
% triangles, and tetrahedra, are one-, two-, and three-dimensional
% examples of simplexes, respectively.)  It makes use of three
% different sets of 'rand' variables, one to locate values
% uniformly within each type of simplex, another to randomly
% select representatives of each different type of simplex in
% proportion to their volume, and a third to perform random
% permutations to provide an even distribution of simplex choices
% among like types.  For example, with n equal to 3 and s set at,
% say, 40% of the way from a towards b, there will be 2 different
% types of simplex, in this case triangles, each with its own
% area, and 6 different versions of each from permutations, for
% a total of 12 triangles, and these all fit together to form a
% particular planar non-regular hexagon in 3 dimensions, with v
% returned set equal to the hexagon's area.
%
% Roger Stafford - Jan. 19, 2006

Example:

test <- Surrogate::RandVec(a=1000, b=100000, s=2000000, n=140, m=1, Seed=sample(1:1000, size = 1))
sum(test$RandVecOutput)
# 2000000
hist(test$RandVecOutput)

answered Oct 19 '22 17:10

m0nhawk

Here is a hit and miss approach. The basic idea is that finding 140 numbers which sum to 2000000 is the same as breaking 1:2000000 into 140 pieces, which requires 139 cutpoints. Also, note that the minimum of 1000 is somewhat annoying. Just subtract it from all the problem data and add it back in at the end:

rand.nums <- function(a,b,n,k){
  #finds n random integers in range a:b which sum to k
  while(TRUE){
    x <- sample(1:(k - n*a),n-1, replace = TRUE) #cutpoints
    x <- sort(x)
    x <- c(x,k-n*a) - c(0,x)
    if(max(x) <= b-a) return(a+x)
  }
}

Then rand.nums(1000,100000,140,2000000) evaluates to 140 integers in the given range which sum to 2000000. For these choices of parameters, the function returns almost instantly. For other choices of the parameters, a solution might be either impossible or so tightly constrained that finding one by chance is effectively impossible. Thus, caution needs to be taken in using the function. It could be modified by adding a maxtrials parameter and returning NA if maxtrials is exceeded without finding a solution.

answered Oct 19 '22 16:10

John Coleman

Related questions
                            
                                Shiny selectInput very slow on larger data (~15,000 entries) in browser
                            
                                Regular expression to match ALL currency symbols?
                            
                                Use stat_summary to annotate plot with number of observations
                            
                                Reuse ggplot layers in multiple plots
                            
                                Trying to merge multiple csv files in R
                            
                                Return df with a columns values that occur more than once [duplicate]
                            
                                R unable to start device PNG - capabilities() has TRUE for PNG?
                            
                                Extract characters that differ between two strings
                            
                                Compute the mean of two columns in a dataframe
                            
                                Reading Stata 14 file in R
                            
                                Read csv with dates and numbers
                            
                                Changing the outlier rule in a boxplot
                            
                                R is there a way to find Inf/-Inf values?
                            
                                Error using corrplot
                            
                                Exchange two elements of a vector in one call
                            
                                R sorts a vector on its own accord
                            
                                How to replace NA's in a raster object
                            
                                How to produce stacked bars within grouped barchart in R [duplicate]
                            
                                Collapsing rows where some are all NA, others are disjoint with some NAs
                            
                                Find fastest way to get all intervals between identical elements in a vector

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With