Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sampling column values in a matrix, without replacement

Tags:

r

i have some experience with R, but always struggle to write new code. i've found several very helpful posts here while working on my current project, but can't seem to find the next step. here's what i've done so far:

  • imported a 20x20 .csv of rankings; each column contains one instance of each integer from 1 to 20, so all colSums are 210. rowSums vary.

  • used a post on here to randomly sample 4 rows from the original matrix and put them into a new 4x20 matrix.

now, i need to sample 5 columns from each row, without replacement of columns. that is, i need to use each column only once and have five values in each row. (i don't have a preference on whether this gives me a matrix with 20 values in the right places and 60 zeros, or if i get 4 vectors of 5 values. i guess i sort of want the matrix?)

if context helps, i'm trying to create groups based on topic rankings in a classroom. rows are topics and columns are voters (students). ultimately i want to create these random assignments in a for loop, and run the program many times to hopefully optimize the choices (by some measurement; obviously there are different ways to optimize) automatically rather than by staring at the original matrix, which is what i've done in the past.

this is my 4x20 matrix:

    J  E  I  S  A  N  H  T  M  B  D  K  O  G  P  L  Q  R  F  C
2   5  4  1  1  5 13  3  4 13 11 14 14 20  9 15  9 11 17  9 15
13 20 19 17 19 19  7  4 19  7  1  5  1 17 15 10  6  7 14  6  3
14 18  2 12 14 11 19 18 15 19  4  8 19  2  2 13  7  9  1 12 10
18  4  7 18  5 12 18  2 20  6  7 16 15  5 18  1 13  2 18 14 16

this is (one version of) what i want:

    J  E  I  S  A  N  H  T  M  B  D  K  O  G  P  L  Q  R  F  C
2   0  4  1  1  0  0  3  4  0  0  0  0  0  0  0  0  0  0  0  0
13  0  0  0  0  0  7  0  0  0  1  5  1  0  0  0  0  0  0  0  3
14  0  0  0  0 11  0  0  0  0  0  0  0  0  2  0  7  0  1 12  0
18  4  0  0  0  0  0  0  0  6  0  0  0  5  0  1  0  2  0  0  0
like image 883
tbkent Avatar asked Dec 26 '22 10:12

tbkent


1 Answers

You can use apply. The following command will randomly sample five values from each row and return a matrix of the results:

apply(mat, 1, sample, 5)

You might wish to transpose the returned matrix with t to match the original matrix.


If you want to use every column only once, you can use the following command:

mat[cbind(seq(nrow(mat)), sample(ncol(mat), 5 * nrow(mat)))]

It will return a vector including the values.

To match the desired output format (matrix including zeros and randomly chosen values), you can use the following strategy:

# create an index of the values to be kept
idx <- cbind(seq(nrow(mat)), sample(ncol(mat), 5 * nrow(mat)))

# create a new matrix of zeroes
mat2 <- matrix(0, ncol = ncol(mat), nrow = nrow(mat))

# copy the values from the original matrix to the new one
mat2[idx] <- mat[idx]
like image 185
Sven Hohenstein Avatar answered Jan 06 '23 18:01

Sven Hohenstein