Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping a loop to do everything at once

I am trying to simulate the following "game:

  • There is a population of 100 units
  • You randomly sample 10 of these units, record the id's of the units you saw, and then put them back into the population
  • You then take a second sample, record the id's of the units you saw in this second sample along with the first sample, and then put the second sample back into the population
  • Repeat this many times

I wrote the following code in R that performs the above procedure:

library(dplyr)

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,1,10)
var_3 = rnorm(100,5,10)
response = rnorm(100,1,1)

my_data = data.frame(var_1, var_2, var_3, response)
my_data$id = 1:100


results <- list()
results2<- list()

for (i in 1:100)
    
{
    
    iteration_i = i
    
    sample_i = my_data[sample(nrow(my_data), 10), ]
    
    
    results_tmp = data.frame(iteration_i, sample_i)
    
    results[[i]] <- results_tmp
    
}

results_df <- do.call(rbind.data.frame, results)

test_1 <- data.frame(results_df %>% 
    group_by(id) %>% 
    filter(iteration_i == min(iteration_i)) %>% 
    distinct)


summary_file = data.frame(test_1 %>% group_by(iteration_i) %>% summarise(Count = n()))

cumulative = cumsum(summary_file$Count)

summary_file$Cumulative = cumulative

summary_file$unobserved = 100 - cumulative

The result looks something like this:

> summary_file
   iteration_i Count Cumulative unobserved
1            1    10         10         90
2            2     8         18         82
3            3     9         27         73
4            4     8         35         65
5            5     6         41         59
6            6     5         46         54
7            7     7         53         47
8            8     7         60         40
9            9     4         64         36
10          10     3         67         33
11          11     4         71         29
12          12     4         75         25
13          13     1         76         24
14          14     4         80         20
15          15     1         81         19
16          16     2         83         17
17          17     2         85         15
18          18     1         86         14
19          20     1         87         13
20          22     1         88         12
21          23     2         90         10
22          24     1         91          9
23          25     1         92          8
24          27     2         94          6
25          28     1         95          5
26          30     1         96          4
27          35     1         97          3
28          37     1         98          2
29          44     1         99          1
30          46     1        100          0

I would now like to repeat this "game" many times.

  • I would like to keep the "summary_file" for each "game" (e.g. summary_file_1, summary_file_2, summary_file_3, etc.)

  • I would then like to create a "total" summary file that shows the number of iterations that were required in each game to observe all units.

This total_summary_file would look something like this:

 game_id iterations_required
1  game_1                  47
2  game_2                  45
3  game_3                  44
4  game_4                  42
5  game_5                  42

Currently, I am just copy/pasting my earlier code several times and storing the results, then I append everything at the end and calculate the summary statistics - but I am trying to find a way to "loop the loop" and do everything at once. I do not know if it is possible to introduce a command like "results_df_i <- do.call(rbind.data.frame, results_i)" into the loop and efficiently create everything at the same time instead of manually copy/pasting the earlier loop.

like image 562
stats_noob Avatar asked Oct 13 '25 11:10

stats_noob


1 Answers

You're making this a lot less efficient than it could be. To get, say, 100 repeated samples of 10 from the set 1:100 (with replacement), we can do replicate(100, sample(100, 10, TRUE)).

We can then coerce this into a vector and count the number of unique values every 10 entries along the vector until we get to 100. This gives us the number of iterations required to exhaust the samples.

If we put this inside an sapply, we don't even need an explicit loop, which means we can create the results data frame in a single call:

set.seed(1)

n_games <- 10

results <- data.frame(game_id = paste("game", seq(n_games), sep = "_"),
           iterations_required = sapply(seq(n_games), function(x) {
  samp <- c(replicate(100, sample(100, 10, TRUE)))
  sum(sapply(1:100 * 10, function(n) length(unique(samp[1:n]))) < 100)
  }))

results
#>    game_id iterations_required
#> 1   game_1                  59
#> 2   game_2                  44
#> 3   game_3                  54
#> 4   game_4                  59
#> 5   game_5                  57
#> 6   game_6                  58
#> 7   game_7                  96
#> 8   game_8                  60
#> 9   game_9                  71
#> 10 game_10                  33

Created on 2022-06-11 by the reprex package (v2.0.1)

like image 71
Allan Cameron Avatar answered Oct 15 '25 00:10

Allan Cameron