This is a follow on from a previous question I asked, but adds an extra layer of complexity, hence a new question.
I have two groups (39 and 380 in the example below). What I need to do is assign 889 people into the 39 groups consisting of between 2 to 7 people and the 380 groups consisting of between 2 to 6 people.
However, there is a constraint on the total number of people that can belong in certain sets of groups. In the example below that maximum value allowed for each row is in column X6.
Using the example below. If in row 2 there were 6 people assigned in column X2 and 120 people assigned in column X4 then the total of people would be 18(6*3)+240(120*2) = 258, so that would be fine as it would be under 324.
So what I am after for each row is a value of X1*X2 + X3*X4 (to make column X5) that is less or equal to X6 with the sum of X2 being 39, the sum of X4 being 380 and the total sum of X5 being 889. Ideally any solution would be as random as possible (so if repeated you would get a different solution if possible) and one that would work when the values are different to 889, 39 and 380.
Thanks!
DF <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF[,1] <- c(2:7,"Sum")
DF[7,2] <- 39
DF[2:6,3] <- 2:6
DF[7,4] <- 380
DF[7,5] <- 889
DF[1:6,6] <- c(359, 324, 134, 31, 5, 2)
DF[1,3:4] <- NA
DF[7,3] <- NA
DF[7,6] <- NA
EDIT
The phrasing of my problem may not be clearest. Here is an example of the code I am currently using and how it does not meet the criteria I set above
homeType=rep(c("a", "b"), times=c(39, 380))
H <- vector(mode="list", length(homeType))
for(i in seq(H)){
H[[i]]$type <- homeType[i]
H[[i]]$n <- 0
}
# Place people in houses up to max number of people
npeople <- 889
for(i in seq(npeople)){
placed_in_house <- FALSE
while(!placed_in_house){
house_num <- sample(length(H), 1)
if(H[[house_num]]$type == "a"){
if(H[[house_num]]$n < 7){
H[[house_num]]$n <- H[[house_num]]$n + 1
placed_in_house <- TRUE
}
}
if(H[[house_num]]$type == "b"){
if(H[[house_num]]$n < 6){
H[[house_num]]$n <- H[[house_num]]$n + 1
placed_in_house <- TRUE
}
}
}
}
# move people around to get up to min number of people
for(i in seq(H)){
while(H[[i]]$n < 2){
knock_on_door <- sample(length(H), 1)
if( H[[knock_on_door]]$n > 2){
H[[i]]$n <- H[[i]]$n + 1 # house i takes 1 person
H[[knock_on_door]]$n <- H[[knock_on_door]]$n - 1 # house knock_on_door loses 1 person
}
}
}
Ha <- H[which(lapply(H, function(x){x$type}) == "a")]
Hb <- H[which(lapply(H, function(x){x$type}) == "b")]
Ha_T <- data.frame(t(table(data.frame(matrix(unlist(Ha), nrow=length(Ha), byrow=T)))))
Hb_T <- data.frame(t(table(data.frame(matrix(unlist(Hb), nrow=length(Hb), byrow=T)))))
DF_1 <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF_1[,1] <- c(2:7,"Sum")
DF_1[7,2] <- 39
DF_1[2:6,3] <- 2:6
DF_1[7,4] <- 380
DF_1[7,5] <- 889
DF_1[1:6,6] <- c(359, 324, 134, 31, 5, 2)
for(i in 1:nrow(Ha_T)){DF_1[as.numeric(as.character(Ha_T[i,1]))-1,2] <- Ha_T[i,3]}
for(i in 1:nrow(Hb_T)){DF_1[as.numeric(as.character(Hb_T[i,1])),4] <- Hb_T[i,3]}
DF_1$X5[1:6] <- (as.numeric(as.character(DF_1$X1[1:6]))*DF_1$X2[1:6])+(as.numeric(as.character(DF_1$X3[1:6]))*DF_1$X4[1:6])
DF_1$X7 <- DF_1$X2+DF_1$X4
DF_1[1,3:4] <- NA
DF_1[7,3] <- NA
DF_1[7,6] <- NA
Using this example the problem is row 2 in DF_1. The value in Column X7 (X2+X4) is greater than the permitted number shown in Column X6. What I need is a solution where the values in X7 are less or equal to the values in X6, but the sum of columns X2, X4 and X5 (X1*X2+X3*X4) equal 39, 380 and 889 respectively (although these numbers change depending on the data used).
Randomly assign data to groups. To randomly people (or anything) to groups you can use the RANDBETWEEN function with the CHOOSE function. In the example shown, the formula in F3 is: When copied down the column, this formula will generate a random group (A, B, or C) for each person in the list.
For example, if we have a data frame called df that contains a column say Employee_ID and we want to create five groups that are stored in a vector say Grp then random assignment of participants to values in Grp can be done by using the command given below − Student_ID<-sample (214215:954721,20) df1<-data.frame (Student_ID) df1
To randomly people (or anything) to groups you can use the RANDBETWEEN function with the CHOOSE function. In the example shown, the formula in F3 is: When copied down the column, this formula will generate a random group (A, B, or C) for each person in the list. Note: this is a random approach that will allow groups of difference sizes.
Generate Random Integers under a Single DataFrame Column Here is a template that you may use to generate random integers under a single DataFrame column: import numpy as np import pandas as pd data = np.random.randint (lowest integer, highest integer, size=number of random integers) df = pd.DataFrame (data, columns= ['column name']) print (df)
The original description of the problem in the question is impossible to satisfy, as there are no values that can satisfy all these constraints.
"So what I am after for each row is a value of X1*X2 + X3*X4 (to make column X5) that is less or equal to X6 with the sum of X2 being 39, the sum of X4 being 380 and the total sum of X5 being 889. "
However, following a restatement of the problem in the comments, the revised description of the problem can be solved as follows.
Update: Solution based on clarification of the problem in comments
According to a clarification in the comments
"I am not actually filling the number of houses completely. I am just assigning the number of children into houses. This is why 'a' is 2 to 7 and 'b' is 2 to 6, as 'a' households will also include 1 adult and 'b' households 2. For a given area I know how many 2 to 8 person households there are (419), and how many 2,3,4,5,6,7 or 8 person households exist (359,324,134,31,5,2). I also know the total number of households with either 1 (39) or 2 (380) adults, and how many children there are (889 in my example)."
Based on this updated information we can do the following, in which we loop over 1) calculate how many more houses of each type can be allocated according to the criteria, 2) randomly select one of the house types that can still be allocated without breaching one of the rules 3) and repeat until all 889 children are in houses. Note that I use more descriptive column names here, to make it easier to follow the logic:
DT <- data.table(HS1 = 2:7, # type 1 house size
NH1 = 0, # number of type 1 houses with children
HS2 = 1:6, # type 2 house size
NH2 = 0, # number of type 2 houses with children
C = 0, # number of children in houses
MaxNH = c(359, 324, 134, 31, 5, 2)) # maximum number of type1+type 2 houses
NR = DT[,.N]
set.seed(1234)
repeat {
while (DT[, sum(C) < 889]) {
DT[, MaxH1 := (MaxNH - NH1 - NH2)]
DT[, MaxH2 := (MaxNH - NH1 - NH2)]
DT[1,MaxH2 := 0 ]
DT[MaxH1 > 39 - sum(NH1), MaxH1 := 39 - sum(NH1)]
DT[MaxH2 > 380- sum(NH2), MaxH2 := 380- sum(NH2)]
if (DT[, sum(NH1)] >= 39) DT[, MaxH1 := 0]
if (DT[, sum(NH2)] >= 380) DT[, MaxH1 := 0]
if (DT[, all(MaxH1==0) & all(MaxH2==0)]) { # check if it is not possible to assign anyone else to a group
print("No solution found. Check constraints or try again")
break
}
# If you wish to preferentially fill a particular type of house, then change the probability weights in the next line accordingly
newgroup = sample(2*NR, 1, prob = DT[, c(MaxH1, MaxH2)])
if (newgroup > NR) DT[rep(1:NR, 2)[newgroup], NH2 := NH2+1] else DT[rep(1:NR, 2)[newgroup], NH1 := NH1+1]
DT[, C := HS1*NH1 + HS2*NH2]
}
if (DT[, sum(C)==889]) break
}
DT[,1:6, with=F]
# HS1 NH1 HS2 NH2 C MaxNH
#1: 2 7 1 0 14 359
#2: 3 7 2 218 457 324
#3: 4 14 3 76 284 134
#4: 5 9 4 14 101 31
#5: 6 2 5 3 27 5
#6: 7 0 6 1 6 2
colSums(DT[, .(NH1, NH2, C)])
# NH1 NH2 C
# 39 312 889
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With