Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loop R multiple samples from single dataset

I am attempting to create a simple loop in R, where I have a large dataset and I want to create multiple smaller samples from this dataset and export them to excel:

I thought it would work like this, but it doesn't:

 idorg <- c(1,2,3,4,5)
 x <- c(14,20,21,16,17)
 y <- c(31,21,20,50,13)
 dataset <- cbind (idorg,x,y)


 for (i in 1:4)
 {
 attempt[i] <- dataset[sample(1:nrow(dataset), 3, replace=FALSE),]
 write.table(attempt[i], "C:/Users/me/Desktop/WWD/Excel/dataset[i].xls", sep='\t')
 }

In Stata you would need to preserve and restore your data when doing a loop like this, but is this also necessary in R?

like image 388
Jack Nielson Avatar asked Oct 16 '12 07:10

Jack Nielson


2 Answers

You have following problems:

  1. attempt is not declared, so attempt[i] cannot be assigned to. Either make it a matrix to fill up within the loop (if you want to keep the samples), or use it as a temporary variable attempt.
  2. The file name is take literary, you need to use paste() or sprintf() to include the value of the variable i in the file name.

Here is a working version of the code:

idorg <- c(1,2,3,4,5)
x <- c(14,20,21,16,17)
y <- c(31,21,20,50,13)
dataset <- cbind (idorg,x,y)

for (i in 1:4)  {
  attempt <- dataset[sample(1:nrow(dataset), 3, replace=FALSE),]
  write.table(attempt, sprintf( "C:/Users/me/Desktop/WWD/Excel/dataset[%d].xls", i ), sep='\t')
}

Will Excel be able to read such a tab-separated table? I'm not sure; I would make a comma separated table and save it as .csv.

like image 188
January Avatar answered Nov 10 '22 20:11

January


Unlike Stata, you don't need to preserve and restore your data for this kind of operation in R.

I think January's solution solves your problem, but I wanted to share another alternative: using lapply() to get a list of all the samples of the dataset:

set.seed(1) # So you can reproduce these results
temp <- setNames(lapply(1:4,
                        function(x) { 
                          x <- dataset[sample(1:nrow(dataset),
                                              3, replace = FALSE), ]; x }),
                 paste0("attempt.", 1:4))

This has created a list() named "temp" that comprises four data.frames.

temp
# $attempt.1
#      idorg  x  y
# [1,]     2 20 21
# [2,]     5 17 13
# [3,]     4 16 50
# 
# $attempt.2
#      idorg  x  y
# [1,]     5 17 13
# [2,]     1 14 31
# [3,]     3 21 20
# 
# $attempt.3
#      idorg  x  y
# [1,]     5 17 13
# [2,]     3 21 20
# [3,]     2 20 21
# 
# $attempt.4
#      idorg  x  y
# [1,]     1 14 31
# [2,]     5 17 13 
# [3,]     4 16 50

Lists are very convenient in R. You can now use lapply() to do other fun things, like if you wanted to find out the row sums, you can do lapply(temp, rowSums). Or, if you wanted to output separate CSV files (readable by Excel), you can do something like this:

lapply(names(temp), function(x) write.csv(temp[[x]],
                             file = paste0(x, ".csv")))
like image 2
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 10 '22 20:11

A5C1D2H2I1M1N2O1R2T1