Randomly assign values into data frame/matrix of different size groups that meet multiple criteria

Q: How do you generate random integers from a Dataframe?

Generate Random Integers under a Single DataFrame Column Here is a template that you may use to generate random integers under a single DataFrame column: import numpy as np import pandas as pd data = np.random.randint (lowest integer, highest integer, size=number of random integers) df = pd.DataFrame (data, columns= ['column name']) print (df)

Tags:

r

This is a follow on from a previous question I asked, but adds an extra layer of complexity, hence a new question.

I have two groups (39 and 380 in the example below). What I need to do is assign 889 people into the 39 groups consisting of between 2 to 7 people and the 380 groups consisting of between 2 to 6 people.

However, there is a constraint on the total number of people that can belong in certain sets of groups. In the example below that maximum value allowed for each row is in column X6.

Using the example below. If in row 2 there were 6 people assigned in column X2 and 120 people assigned in column X4 then the total of people would be 18(6*3)+240(120*2) = 258, so that would be fine as it would be under 324.

So what I am after for each row is a value of X1*X2 + X3*X4 (to make column X5) that is less or equal to X6 with the sum of X2 being 39, the sum of X4 being 380 and the total sum of X5 being 889. Ideally any solution would be as random as possible (so if repeated you would get a different solution if possible) and one that would work when the values are different to 889, 39 and 380.

Thanks!

DF <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF[,1] <- c(2:7,"Sum")
DF[7,2] <- 39
DF[2:6,3] <- 2:6
DF[7,4] <- 380
DF[7,5] <- 889
DF[1:6,6] <- c(359, 324, 134, 31, 5, 2)
DF[1,3:4] <- NA
DF[7,3] <- NA
DF[7,6] <- NA

EDIT

The phrasing of my problem may not be clearest. Here is an example of the code I am currently using and how it does not meet the criteria I set above

homeType=rep(c("a", "b"), times=c(39, 380))
H <- vector(mode="list", length(homeType))
for(i in seq(H)){
  H[[i]]$type <- homeType[i]
  H[[i]]$n <- 0
}

# Place people in houses up to max number of people
npeople <- 889
for(i in seq(npeople)){
  placed_in_house <- FALSE
  while(!placed_in_house){
    house_num <- sample(length(H), 1)
    if(H[[house_num]]$type == "a"){
      if(H[[house_num]]$n < 7){
        H[[house_num]]$n <- H[[house_num]]$n + 1
        placed_in_house <- TRUE
      }
    }
    if(H[[house_num]]$type == "b"){
      if(H[[house_num]]$n < 6){
        H[[house_num]]$n <- H[[house_num]]$n + 1
        placed_in_house <- TRUE
      }
    }
  }
}

# move people around to get up to min number of people
for(i in seq(H)){
  while(H[[i]]$n < 2){
    knock_on_door <- sample(length(H), 1)
    if( H[[knock_on_door]]$n > 2){
      H[[i]]$n <- H[[i]]$n + 1 # house i takes 1 person
      H[[knock_on_door]]$n <- H[[knock_on_door]]$n - 1 # house knock_on_door loses 1 person
    }
  }
}

Ha <- H[which(lapply(H, function(x){x$type}) == "a")]
Hb <- H[which(lapply(H, function(x){x$type}) == "b")]

Ha_T <- data.frame(t(table(data.frame(matrix(unlist(Ha), nrow=length(Ha), byrow=T)))))
Hb_T <- data.frame(t(table(data.frame(matrix(unlist(Hb), nrow=length(Hb), byrow=T)))))

DF_1 <- data.frame(matrix(0, nrow = 7, ncol = 6))
DF_1[,1] <- c(2:7,"Sum")
DF_1[7,2] <- 39
DF_1[2:6,3] <- 2:6
DF_1[7,4] <- 380
DF_1[7,5] <- 889
DF_1[1:6,6] <- c(359, 324, 134, 31, 5, 2)
for(i in 1:nrow(Ha_T)){DF_1[as.numeric(as.character(Ha_T[i,1]))-1,2] <- Ha_T[i,3]}
for(i in 1:nrow(Hb_T)){DF_1[as.numeric(as.character(Hb_T[i,1])),4] <- Hb_T[i,3]}
DF_1$X5[1:6] <- (as.numeric(as.character(DF_1$X1[1:6]))*DF_1$X2[1:6])+(as.numeric(as.character(DF_1$X3[1:6]))*DF_1$X4[1:6])
DF_1$X7 <- DF_1$X2+DF_1$X4
DF_1[1,3:4] <- NA
DF_1[7,3] <- NA
DF_1[7,6] <- NA

Using this example the problem is row 2 in DF_1. The value in Column X7 (X2+X4) is greater than the permitted number shown in Column X6. What I need is a solution where the values in X7 are less or equal to the values in X6, but the sum of columns X2, X4 and X5 (X1*X2+X3*X4) equal 39, 380 and 889 respectively (although these numbers change depending on the data used).

470

asked Aug 16 '16 13:08

Chris

1 Answers

The original description of the problem in the question is impossible to satisfy, as there are no values that can satisfy all these constraints.

"So what I am after for each row is a value of X1*X2 + X3*X4 (to make column X5) that is less or equal to X6 with the sum of X2 being 39, the sum of X4 being 380 and the total sum of X5 being 889. "

However, following a restatement of the problem in the comments, the revised description of the problem can be solved as follows.

Update: Solution based on clarification of the problem in comments

According to a clarification in the comments

"I am not actually filling the number of houses completely. I am just assigning the number of children into houses. This is why 'a' is 2 to 7 and 'b' is 2 to 6, as 'a' households will also include 1 adult and 'b' households 2. For a given area I know how many 2 to 8 person households there are (419), and how many 2,3,4,5,6,7 or 8 person households exist (359,324,134,31,5,2). I also know the total number of households with either 1 (39) or 2 (380) adults, and how many children there are (889 in my example)."

Based on this updated information we can do the following, in which we loop over 1) calculate how many more houses of each type can be allocated according to the criteria, 2) randomly select one of the house types that can still be allocated without breaching one of the rules 3) and repeat until all 889 children are in houses. Note that I use more descriptive column names here, to make it easier to follow the logic:

DT <- data.table(HS1 = 2:7, # type 1 house size
                 NH1 = 0,   # number of type 1 houses with children
                 HS2 = 1:6, # type 2 house size
                 NH2 = 0,   # number of type 2 houses with children
                 C = 0,     # number of children in houses
                 MaxNH = c(359, 324, 134, 31, 5, 2)) # maximum number of type1+type 2 houses
NR = DT[,.N]
set.seed(1234)
repeat {
  while (DT[, sum(C) < 889]) {
    DT[, MaxH1 := (MaxNH - NH1 - NH2)]
    DT[, MaxH2 := (MaxNH - NH1 - NH2)]
    DT[1,MaxH2 := 0 ]
    DT[MaxH1 > 39 - sum(NH1), MaxH1 := 39 - sum(NH1)]
    DT[MaxH2 > 380- sum(NH2), MaxH2 := 380- sum(NH2)]
    if (DT[, sum(NH1)] >= 39)  DT[, MaxH1 := 0]
    if (DT[, sum(NH2)] >= 380) DT[, MaxH1 := 0]

    if (DT[, all(MaxH1==0) & all(MaxH2==0)]) { # check if it is not possible to assign anyone else to a group
      print("No solution found. Check constraints or try again")
      break
    }
    # If you wish to preferentially fill a particular type of house, then change the probability weights in the next line accordingly
    newgroup = sample(2*NR, 1, prob = DT[, c(MaxH1, MaxH2)])
    if (newgroup > NR) DT[rep(1:NR, 2)[newgroup], NH2 := NH2+1] else DT[rep(1:NR, 2)[newgroup], NH1 := NH1+1]

    DT[, C := HS1*NH1 + HS2*NH2]
  }
  if (DT[, sum(C)==889]) break
}

DT[,1:6, with=F]
#   HS1 NH1 HS2 NH2   C MaxNH 
#1:   2   7   1   0  14   359 
#2:   3   7   2 218 457   324 
#3:   4  14   3  76 284   134  
#4:   5   9   4  14 101    31  
#5:   6   2   5   3  27     5 
#6:   7   0   6   1   6     2 

colSums(DT[, .(NH1, NH2, C)])
# NH1 NH2   C 
#  39 312 889

173

answered Sep 22 '22 23:09

dww

Related questions
                            
                                Pass column name to function from mutate_each
                            
                                R: How to change plot background color for a specific range in ggvis shiny app
                            
                                force "apply" to return a matrix?
                            
                                What causes this ggplot2 facet bug?
                            
                                Calculating partial correlation adjusted for a categorical variable
                            
                                How to Make RStudio Presentation Self-contained?
                            
                                Regression table in latex from splm
                            
                                how to override the 2GB memory limit when R starts
                            
                                User-specified attributes of data.table get removed
                            
                                Fastest way to apply function to all pairwise combinations of columns
                            
                                dplyr summarise_each() using multiple functions for different column subsets across the same groups
                            
                                Error in 1:object$nsdf : argument of length 0 when using plot.gam [duplicate]
                            
                                Axis labels not showing up
                            
                                how to include Leaflet (for R)-output into RMarkdown presentation?
                            
                                R, knitr doesn't respect order of chunks and text
                            
                                Creating table graph in R with plotly
                            
                                Conditional matrix adjacency calculation
                            
                                Add new variable (column) in the fly to a reactive dataframe in Shiny
                            
                                R Password protect .rdata datafile
                            
                                Java Script alert R encountered a fatal error. This session was terminated

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Randomly assign values into data frame/matrix of different size groups that meet multiple criteria

Tags:

r

Chris

People also ask

1 Answers

dww

Recent Activity

Donate For Us