Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random sampling to give an exact sum

Tags:

random

r

sampling

I want to sample 140 numbers between 1000 to 100000 such that the sum of these 140 numbers is around 2 million (2000000):

sample(1000:100000,140)

such that:

sum(sample(1000:100000,140)) = 2000000

Any pointers how I can achieve this?

like image 564
Hardik Gupta Avatar asked Feb 27 '18 18:02

Hardik Gupta


People also ask

What are the 4 types of random sampling?

There are four primary, random (probability) sampling methods – simple random sampling, systematic sampling, stratified sampling, and cluster sampling.

How do you randomly sample out of the population to get accurate results?

One way of obtaining a random sample is to give each individual in a population a number, and then use a table of random numbers to decide which individuals to include.

What sampling method is lottery?

Method of lottery Using the lottery method is one of the oldest ways and is a mechanical example of random sampling. In this method, the researcher gives each member of the population a number. Researchers draw numbers from the box randomly to choose samples.

What are some examples of random sampling?

An example of a simple random sample would be the names of 25 employees being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen.


2 Answers

There exist an algorithm for generating such random numbers.

Originally created for MATLAB, there is an R implementation of it:

Surrogate::RandVec

Citation from MATLAB script comment:

%   This generates an n by m array x, each of whose m columns
% contains n random values lying in the interval [a,b], but
% subject to the condition that their sum be equal to s.  The
% scalar value s must accordingly satisfy n*a <= s <= n*b.  The
% distribution of values is uniform in the sense that it has the
% conditional probability distribution of a uniform distribution
% over the whole n-cube, given that the sum of the x's is s.
%
%   The scalar v, if requested, returns with the total
% n-1 dimensional volume (content) of the subset satisfying
% this condition.  Consequently if v, considered as a function
% of s and divided by sqrt(n), is integrated with respect to s
% from s = a to s = b, the result would necessarily be the
% n-dimensional volume of the whole cube, namely (b-a)^n.
%
%   This algorithm does no "rejecting" on the sets of x's it
% obtains.  It is designed to generate only those that satisfy all
% the above conditions and to do so with a uniform distribution.
% It accomplishes this by decomposing the space of all possible x
% sets (columns) into n-1 dimensional simplexes.  (Line segments,
% triangles, and tetrahedra, are one-, two-, and three-dimensional
% examples of simplexes, respectively.)  It makes use of three
% different sets of 'rand' variables, one to locate values
% uniformly within each type of simplex, another to randomly
% select representatives of each different type of simplex in
% proportion to their volume, and a third to perform random
% permutations to provide an even distribution of simplex choices
% among like types.  For example, with n equal to 3 and s set at,
% say, 40% of the way from a towards b, there will be 2 different
% types of simplex, in this case triangles, each with its own
% area, and 6 different versions of each from permutations, for
% a total of 12 triangles, and these all fit together to form a
% particular planar non-regular hexagon in 3 dimensions, with v
% returned set equal to the hexagon's area.
%
% Roger Stafford - Jan. 19, 2006

Example:

test <- Surrogate::RandVec(a=1000, b=100000, s=2000000, n=140, m=1, Seed=sample(1:1000, size = 1))
sum(test$RandVecOutput)
# 2000000
hist(test$RandVecOutput)

like image 61
m0nhawk Avatar answered Oct 19 '22 17:10

m0nhawk


Here is a hit and miss approach. The basic idea is that finding 140 numbers which sum to 2000000 is the same as breaking 1:2000000 into 140 pieces, which requires 139 cutpoints. Also, note that the minimum of 1000 is somewhat annoying. Just subtract it from all the problem data and add it back in at the end:

rand.nums <- function(a,b,n,k){
  #finds n random integers in range a:b which sum to k
  while(TRUE){
    x <- sample(1:(k - n*a),n-1, replace = TRUE) #cutpoints
    x <- sort(x)
    x <- c(x,k-n*a) - c(0,x)
    if(max(x) <= b-a) return(a+x)
  }
}

Then rand.nums(1000,100000,140,2000000) evaluates to 140 integers in the given range which sum to 2000000. For these choices of parameters, the function returns almost instantly. For other choices of the parameters, a solution might be either impossible or so tightly constrained that finding one by chance is effectively impossible. Thus, caution needs to be taken in using the function. It could be modified by adding a maxtrials parameter and returning NA if maxtrials is exceeded without finding a solution.

like image 28
John Coleman Avatar answered Oct 19 '22 16:10

John Coleman