Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increasing weighted mean by random sample

I have written code in order to randomly add points to a numeric variable to increase the weighted mean score by 10% storing the new scores in variable S1.

This is done by calculating the total number of points that need to be added to increase the mean by 10%. Next step is to randomly select until the weighted sum of responses is equal to the target - but not adding points where the score is already 10 so as not to pass the maximum value on the scale. The final stage is to select whether the sum that is just above or just below the target is closest and select this sample to add points to.

The code works ok but doesn't look efficient. I am an R novice and have read that loops should be avoided as much as possible, but cannot work out an alternative. Is it possible to do what I am attempting, but more efficiently?

#Create random data    
library(stats)
    set.seed(21821)
    ncust <- 1000
    cust.df <- data.frame(cust.id=as.factor(c(1:ncust)))
    wtvar <- rnorm(ncust, mean=1, sd=0.2)
    V1 <- floor(rnorm(ncust, mean=7.5, sd=3))
    V1[V1 > 10] <- 10
    V1[V1 < 1] <- 1
    cust.df$V1 <- V1
    cust.df$wtvar <- wtvar

#Function to determine sample required   
    random.sample <- function(x) {
    (pctadd <- (sum(cust.df$V1*cust.df$wtvar)*0.1)/sum(cust.df$V1[!cust.df$V1==10]*cust.df$wtvar[!cust.df$V1==10])) #percentage of resps (weighted) who need points added to make 10% increase
    (numadd <- (sum(cust.df$V1*cust.df$wtvar)*0.1)) #sum of weights needed to make 10% increase

    wgttot <- vector(mode="numeric", length=0)
    idtot <- vector(mode="numeric", length=0)
    id.ref <- cust.df$cust.id[!cust.df$V1==10]

      repeat {
        preidtot <- idtot
        prewgttot <- wgttot
        (t.id <- as.numeric(sample(id.ref, 1)))
        (t.wgt <- cust.df$wtvar[cust.df$cust.id==t.id])
        id.ref <- id.ref[!id.ref==t.id]
        wgttot <- c(wgttot,t.wgt)
        idtot <- c(idtot, t.id)
        if (sum(wgttot) > numadd) break
      }
      prediff <- numadd - sum(prewgttot)
      postdiff <- sum(wgttot) - numadd
      if (prediff < postdiff) {
        x <- preidtot
      } else {
        x <- idtot
        }
      return(x)
    }

tempids <- random.sample()

#Apply sample rule
cust.df$S1 = ifelse(cust.df$cust.id %in% tempids, cust.df$V1 + 1, cust.df$V1)

#Check ~10% increase achieved
weighted.mean(cust.df$V1,cust.df$wtvar)
weighted.mean(cust.df$S1,cust.df$wtvar)
like image 993
Jorvik77 Avatar asked Mar 10 '23 05:03

Jorvik77


1 Answers

The random.sample is your first version, random.sample1 is the without-loop version, random.sample1 do similar thing as random.sample, but their results are different. You can check the code to see how the result of random.sample1 is used. And due to fact that from your definition, the samples required are not unique, so the results of weighted sum are also different, but all approximately increase by 10%.

#Create random data    
library(stats)
set.seed(21821)
ncust <- 1000
cust.df <- data.frame(cust.id=as.factor(c(1:ncust)))
wtvar <- rnorm(ncust, mean=1, sd=0.2)
V1 <- floor(rnorm(ncust, mean=7.5, sd=3))
V1[V1 > 10] <- 10
V1[V1 < 1] <- 1
cust.df$V1 <- V1
cust.df$wtvar <- wtvar

#Function to determine sample required   
random.sample <- function() {
    (pctadd <- (sum(cust.df$V1*cust.df$wtvar)*0.1)/sum(cust.df$V1[!cust.df$V1==10]*cust.df$wtvar[!cust.df$V1==10])) #percentage of resps (weighted) who need points added to make 10% increase
    (numadd <- (sum(cust.df$V1*cust.df$wtvar)*0.1)) #sum of weights needed to make 10% increase

    wgttot <- vector(mode="numeric", length=0)
    idtot <- vector(mode="numeric", length=0)
    id.ref <- cust.df$cust.id[!cust.df$V1==10]

    repeat {
        preidtot <- idtot
        prewgttot <- wgttot
        (t.id <- as.numeric(sample(id.ref, 1)))
        (t.wgt <- cust.df$wtvar[cust.df$cust.id==t.id])
        id.ref <- id.ref[!id.ref==t.id]
        wgttot <- c(wgttot,t.wgt)
        idtot <- c(idtot, t.id)
        if (sum(wgttot) > numadd) break
    }
    prediff <- numadd - sum(prewgttot)
    postdiff <- sum(wgttot) - numadd
    if (prediff < postdiff) {
        x <- preidtot
    } else {
        x <- idtot
    }
    return(x)
}

random.sample1 <- function() {
    numadd <- sum(cust.df$V1 * cust.df$wtvar) * 0.1 #sum of weights needed to make 10% increase
    id.ref <- which(cust.df$V1 != 10)
    pos <- sample(id.ref, length(id.ref))
    t.wgt <- cust.df$wtvar[pos]
    sumwgttot <- cumsum(t.wgt)
    return(pos[1:which.min(abs(sumwgttot - numadd))])
}

system.time(tempids <- random.sample())
## On my computer, it uses about 0.200s to finish the calculation.
system.time(tempids1 <- random.sample1())
## On my computer, the without loop version uses about 0.000s.

#Apply sample rule
cust.df$S1 = ifelse(cust.df$cust.id %in% tempids, cust.df$V1 + 1, cust.df$V1)
## Note that the usage of tempids1 is different, this usage is more 
## effective than the original one.
cust.df$S2 = cust.df$V1
cust.df$S2[tempids1] = cust.df$V1[tempids1] + 1

#Check ~10% increase achieved
weighted.mean(cust.df$V1,cust.df$wtvar)
weighted.mean(cust.df$S1,cust.df$wtvar)
weighted.mean(cust.df$S2,cust.df$wtvar)
like image 82
Consistency Avatar answered Mar 20 '23 00:03

Consistency