Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly assign values into data frame/matrix of different size groups that meet multiple criteria

Tags:

r

This is a follow on from a previous question I asked, but adds an extra layer of complexity, hence a new question.

I have two groups (39 and 380 in the example below). What I need to do is assign 889 people into the 39 groups consisting of between 2 to 7 people and the 380 groups consisting of between 2 to 6 people.

However, there is a constraint on the total number of people that can belong in certain sets of groups. In the example below that maximum value allowed for each row is in column X6.

Using the example below. If in row 2 there were 6 people assigned in column X2 and 120 people assigned in column X4 then the total of people would be 18(6*3)+240(120*2) = 258, so that would be fine as it would be under 324.

So what I am after for each row is a value of X1*X2 + X3*X4 (to make column X5) that is less or equal to X6 with the sum of X2 being 39, the sum of X4 being 380 and the total sum of X5 being 889. Ideally any solution would be as random as possible (so if repeated you would get a different solution if possible) and one that would work when the values are different to 889, 39 and 380.

Thanks!

DF <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF[,1] <- c(2:7,"Sum")
DF[7,2] <- 39
DF[2:6,3] <- 2:6
DF[7,4] <- 380
DF[7,5] <- 889
DF[1:6,6] <- c(359, 324, 134, 31, 5, 2)
DF[1,3:4] <- NA
DF[7,3] <- NA
DF[7,6] <- NA

EDIT

The phrasing of my problem may not be clearest. Here is an example of the code I am currently using and how it does not meet the criteria I set above

homeType=rep(c("a", "b"), times=c(39, 380))
H <- vector(mode="list", length(homeType))
for(i in seq(H)){
  H[[i]]$type <- homeType[i]
  H[[i]]$n <- 0
}

# Place people in houses up to max number of people
npeople <- 889
for(i in seq(npeople)){
  placed_in_house <- FALSE
  while(!placed_in_house){
    house_num <- sample(length(H), 1)
    if(H[[house_num]]$type == "a"){
      if(H[[house_num]]$n < 7){
        H[[house_num]]$n <- H[[house_num]]$n + 1
        placed_in_house <- TRUE
      }
    }
    if(H[[house_num]]$type == "b"){
      if(H[[house_num]]$n < 6){
        H[[house_num]]$n <- H[[house_num]]$n + 1
        placed_in_house <- TRUE
      }
    }
  }
}

# move people around to get up to min number of people
for(i in seq(H)){
  while(H[[i]]$n < 2){
    knock_on_door <- sample(length(H), 1)
    if( H[[knock_on_door]]$n > 2){
      H[[i]]$n <- H[[i]]$n + 1 # house i takes 1 person
      H[[knock_on_door]]$n <- H[[knock_on_door]]$n - 1 # house knock_on_door loses 1 person
    }
  }
}

Ha <- H[which(lapply(H, function(x){x$type}) == "a")]
Hb <- H[which(lapply(H, function(x){x$type}) == "b")]

Ha_T <- data.frame(t(table(data.frame(matrix(unlist(Ha), nrow=length(Ha), byrow=T)))))
Hb_T <- data.frame(t(table(data.frame(matrix(unlist(Hb), nrow=length(Hb), byrow=T)))))

DF_1 <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF_1[,1] <- c(2:7,"Sum")
DF_1[7,2] <- 39
DF_1[2:6,3] <- 2:6
DF_1[7,4] <- 380
DF_1[7,5] <- 889
DF_1[1:6,6] <- c(359, 324, 134, 31, 5, 2)
for(i in 1:nrow(Ha_T)){DF_1[as.numeric(as.character(Ha_T[i,1]))-1,2] <- Ha_T[i,3]}
for(i in 1:nrow(Hb_T)){DF_1[as.numeric(as.character(Hb_T[i,1])),4] <- Hb_T[i,3]}
DF_1$X5[1:6] <- (as.numeric(as.character(DF_1$X1[1:6]))*DF_1$X2[1:6])+(as.numeric(as.character(DF_1$X3[1:6]))*DF_1$X4[1:6])
DF_1$X7 <- DF_1$X2+DF_1$X4
DF_1[1,3:4] <- NA
DF_1[7,3] <- NA
DF_1[7,6] <- NA

Using this example the problem is row 2 in DF_1. The value in Column X7 (X2+X4) is greater than the permitted number shown in Column X6. What I need is a solution where the values in X7 are less or equal to the values in X6, but the sum of columns X2, X4 and X5 (X1*X2+X3*X4) equal 39, 380 and 889 respectively (although these numbers change depending on the data used).

like image 470
Chris Avatar asked Aug 16 '16 13:08

Chris


People also ask

How do you randomly assign data to a group in Excel?

Randomly assign data to groups. To randomly people (or anything) to groups you can use the RANDBETWEEN function with the CHOOSE function. In the example shown, the formula in F3 is: When copied down the column, this formula will generate a random group (A, B, or C) for each person in the list.

How to do random assignment of participants to values in GRP?

For example, if we have a data frame called df that contains a column say Employee_ID and we want to create five groups that are stored in a vector say Grp then random assignment of participants to values in Grp can be done by using the command given below − Student_ID<-sample (214215:954721,20) df1<-data.frame (Student_ID) df1

How do I randomize a list to a group?

To randomly people (or anything) to groups you can use the RANDBETWEEN function with the CHOOSE function. In the example shown, the formula in F3 is: When copied down the column, this formula will generate a random group (A, B, or C) for each person in the list. Note: this is a random approach that will allow groups of difference sizes.

How do you generate random integers from a Dataframe?

Generate Random Integers under a Single DataFrame Column Here is a template that you may use to generate random integers under a single DataFrame column: import numpy as np import pandas as pd data = np.random.randint (lowest integer, highest integer, size=number of random integers) df = pd.DataFrame (data, columns= ['column name']) print (df)


1 Answers

The original description of the problem in the question is impossible to satisfy, as there are no values that can satisfy all these constraints.

"So what I am after for each row is a value of X1*X2 + X3*X4 (to make column X5) that is less or equal to X6 with the sum of X2 being 39, the sum of X4 being 380 and the total sum of X5 being 889. "

However, following a restatement of the problem in the comments, the revised description of the problem can be solved as follows.

Update: Solution based on clarification of the problem in comments

According to a clarification in the comments

"I am not actually filling the number of houses completely. I am just assigning the number of children into houses. This is why 'a' is 2 to 7 and 'b' is 2 to 6, as 'a' households will also include 1 adult and 'b' households 2. For a given area I know how many 2 to 8 person households there are (419), and how many 2,3,4,5,6,7 or 8 person households exist (359,324,134,31,5,2). I also know the total number of households with either 1 (39) or 2 (380) adults, and how many children there are (889 in my example)."

Based on this updated information we can do the following, in which we loop over 1) calculate how many more houses of each type can be allocated according to the criteria, 2) randomly select one of the house types that can still be allocated without breaching one of the rules 3) and repeat until all 889 children are in houses. Note that I use more descriptive column names here, to make it easier to follow the logic:

DT <- data.table(HS1 = 2:7, # type 1 house size
                 NH1 = 0,   # number of type 1 houses with children
                 HS2 = 1:6, # type 2 house size
                 NH2 = 0,   # number of type 2 houses with children
                 C = 0,     # number of children in houses
                 MaxNH = c(359, 324, 134, 31, 5, 2)) # maximum number of type1+type 2 houses
NR = DT[,.N]
set.seed(1234)
repeat {
  while (DT[, sum(C) < 889]) {
    DT[, MaxH1 := (MaxNH - NH1 - NH2)]
    DT[, MaxH2 := (MaxNH - NH1 - NH2)]
    DT[1,MaxH2 := 0 ]
    DT[MaxH1 > 39 - sum(NH1), MaxH1 := 39 - sum(NH1)]
    DT[MaxH2 > 380- sum(NH2), MaxH2 := 380- sum(NH2)]
    if (DT[, sum(NH1)] >= 39)  DT[, MaxH1 := 0]
    if (DT[, sum(NH2)] >= 380) DT[, MaxH1 := 0]

    if (DT[, all(MaxH1==0) & all(MaxH2==0)]) { # check if it is not possible to assign anyone else to a group
      print("No solution found. Check constraints or try again")
      break
    }
    # If you wish to preferentially fill a particular type of house, then change the probability weights in the next line accordingly
    newgroup = sample(2*NR, 1, prob = DT[, c(MaxH1, MaxH2)])
    if (newgroup > NR) DT[rep(1:NR, 2)[newgroup], NH2 := NH2+1] else DT[rep(1:NR, 2)[newgroup], NH1 := NH1+1]

    DT[, C := HS1*NH1 + HS2*NH2]
  }
  if (DT[, sum(C)==889]) break
}

DT[,1:6, with=F]
#   HS1 NH1 HS2 NH2   C MaxNH 
#1:   2   7   1   0  14   359 
#2:   3   7   2 218 457   324 
#3:   4  14   3  76 284   134  
#4:   5   9   4  14 101    31  
#5:   6   2   5   3  27     5 
#6:   7   0   6   1   6     2 

colSums(DT[, .(NH1, NH2, C)])
# NH1 NH2   C 
#  39 312 889  
like image 173
dww Avatar answered Sep 22 '22 23:09

dww