The Problem:
I am attempting to use R to generate a random study design where half of the participants are randomly assigned to "Treatement 1" and the other half are assigned to "Treatment 2". However, because half of the subjects are male and half are female and I also want to ensure that an equal number of males and females are exposed to each treatment, half of the males and females should be assigned to "Treatment 1" and the remaining half should be assigned to "Treatment 2".
There are two complications to this design: (1) This is a yearlong study and the assignment of participants to treatment must occur on a daily basis; and (2) Each participant must be exposed to "Treatment 1" a minimum 10 times in a 28 day period.
Is this even possible to automate this in the R interface? I assume so, but I think my beginner status as an R programmer prohibits me from finding the solution on my own. I have been struggling for days to figure out how to actualize this, and have looked through many similar-sounding posts on this site that were not able to be successfully applied here. I am hoping someone out there knows some tricks that could help me get unstuck in solving this problem, any advice would be greatly appreciated!
What I Have Tried:
Specific Information
# There are 16 participants
p <- c("P01", "P02", "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10", "P11", "P12", "P13", "P14", "P15", "P16")
# Half are male and half are female
g <- c(rep("M", 8), rep("F", 8))
# I make a dataframe but this may not be necessary
df <- cbind.data.frame(p,g)
# There are 365 days in one year
d <- seq(1,365,1)
...unfortunately, I am not sure how to proceed from here.
Ideal Outcome:
I am envisioning something approximate to this table as the outcome:
Basically there is a column for each participant and a row for each day. Associated with each day is an assignment to either Treatment 1 (T1) or Treatment 2 (T2), with 4 of the 8 males and 4 of the 8 females being assigned to T1 and the remainder to T2. These treatments are reassigned every day for 1 year. Not depicted in this chart is the need for each participant to be exposed to T1 at least 10 times in a 28-day period. The table does not have to look like that if something else makes more sense!
How do you randomly assign participants to groups? To implement random assignment, assign a unique number to every member of your study's sample. Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group.
Random assignment helps ensure that members of each group in the experiment are the same, which means that the groups are also likely more representative of what is present in the larger population.
Randomization as a method of experimental control has been extensively used in human clinical trials and other biological experiments. It prevents the selection bias and insures against the accidental bias. It produces the comparable groups and eliminates the source of bias in treatment assignments.
Random assignment of participants helps to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Thus, any differences between groups recorded at the end of the experiment can be more confidently attributed to the experimental procedures or treatment.
Consider splitting data frame by day and gender with by
, then run enough samples with replicate
at 100 times to pick one of several where treatments are balanced:
Data
df <- merge(data.frame(participant = p, gender = g),
data.frame(days = seq(1,365)),
by=NULL)
Solution
df_list <- by(df, list(df$gender, df$days), function(sub){
t <- replicate(100, { # RUN 100 REPETITIONS OF EXPRESSION
s <- sample(c("T1", "T2"), size=nrow(sub), replace=TRUE) # SAMPLE "T1" AND "T2" BY SIZE OF SUBSET
s[ sum(s == "T1") == sum(s == "T2") ] # FILTER TO EQUAL TREATMENTS
})
t <- Filter(length, t)[[1]] # SELECT FIRST OF SEVERAL NON-EMPTY RETURNS
transform(sub, treatment = t) # ASSIGN RESULT TO NEW COLUMN
})
# BIND DATA FRAMES AND RESET ROW.NAMES
final_df <- data.frame(do.call(rbind.data.frame, df_list), row.names=NULL)
Output
Day 1
head(final_df, 16)
# participant gender days treatment
# 1 P09 F 1 T1
# 2 P10 F 1 T2
# 3 P11 F 1 T2
# 4 P12 F 1 T1
# 5 P13 F 1 T2
# 6 P14 F 1 T2
# 7 P15 F 1 T1
# 8 P16 F 1 T1
# 9 P01 M 1 T1
# 10 P02 M 1 T1
# 11 P03 M 1 T2
# 12 P04 M 1 T2
# 13 P05 M 1 T2
# 14 P06 M 1 T1
# 15 P07 M 1 T1
# 16 P08 M 1 T2
Day 365
tail(final_df, 16)
# participant gender days treatment
# 5825 P09 F 365 T2
# 5826 P10 F 365 T2
# 5827 P11 F 365 T1
# 5828 P12 F 365 T2
# 5829 P13 F 365 T1
# 5830 P14 F 365 T2
# 5831 P15 F 365 T1
# 5832 P16 F 365 T1
# 5833 P01 M 365 T1
# 5834 P02 M 365 T2
# 5835 P03 M 365 T1
# 5836 P04 M 365 T2
# 5837 P05 M 365 T2
# 5838 P06 M 365 T2
# 5839 P07 M 365 T1
# 5840 P08 M 365 T1
Ideally, for analytical purposes you should keep data in long format (i.e., tidy data). But if needing wide format consider reshape
with helper and cleanup processing:
# HELPER OBJECTS
final_df$participant_gender <- with(final_df, paste0(participant, gender))
new_names <- paste0(p, g)
# RESHAPE WIDE
wide_df <- reshape(final_df, v.names = "treatment", timevar = "participant_gender",
idvar="days", drop = c("gender", "participant"),
new.row.names = 1:365, direction = "wide")
# RENAME AND RE-ORDER COLUMNS
names(wide_df) <- gsub("treatment.", "", names(wide_df))
wide_df <- wide_df[c("days", new_names)]
head(wide_df)
# days P01M P02M P03M P04M P05M P06M P07M P08M P09F P10F P11F P12F P13F P14F P15F P16F
# 1 1 T1 T1 T2 T2 T2 T1 T1 T2 T1 T2 T2 T1 T2 T2 T1 T1
# 2 2 T1 T1 T2 T1 T2 T1 T2 T2 T1 T2 T2 T1 T2 T2 T1 T1
# 3 3 T1 T1 T2 T1 T1 T2 T2 T2 T1 T2 T2 T2 T1 T2 T1 T1
# 4 4 T1 T1 T1 T2 T2 T2 T1 T2 T2 T1 T1 T2 T2 T1 T1 T2
# 5 5 T1 T1 T2 T1 T2 T2 T1 T2 T1 T1 T2 T1 T2 T2 T1 T2
# 6 6 T2 T1 T1 T1 T2 T2 T1 T2 T2 T2 T2 T1 T2 T1 T1 T1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With