I set myself a little challenge on my way to learning R. The question was, given a sample of 500 numbers in normal distribution with mean 20, how many numbers under 20 would I get for standard deviations from 6 to 10. Just to have to learn more I decided to get 4 samples for each sd. So by the end I should have:
sd6samp1:...
sd6samp2:...
....
sd10samp4:...
My first approach, which worked was:
ddss<-c(6:10) # sd's
sam<-c(1:4) # 4 samples for each
k=0 # counter in 0
for (i in ddss) { # for each sd
for (j in sam) { # for each sample
nam <- paste("sam",i,".",j, sep="") # building a name
n <- assign(nam,rnorm(500, 20, i)) # the great assign function
k <- k+sum(n<=0)
}
print(assign(paste("ds",i,sep=""), k)) # ohh assign you're great
k=0 # reset counter
}
While looking for how to create variable names with the looping 'i', founded that 'assign' does the work but it also said:
Note though that if you are planning some simulations, many guRus would say that you should use a list.
So I thoght it would be good to learn lists...
In the meanwhile I also discover a great other option... ddss <- c(6:10)
for (i in ddss) {
print(paste('prob. x<=0), with sd=',i))
print(pnorm(0,mean=20,sd=i)*500)
}
This worked to answer the question, but the lists were still to be done... and a lot of R has yet to be learned. The main idea wasn't to know the very prob or number of negatives... but to learn R and specifically some looping.
So, I've been trying to go with the mentioned lists
My closest approach has been:
ddss<-c(6:10) # sd's to be calculated.
sam<-c(1:4) # 4 samples for each sd
liss<-list() # initializing the list
for (i in ddss) { # for each sd
liss[[i]] <- list()
for (j in sam) { # for each sample
liss[[i]][[j]] <- rnorm(500, 20, i)
print(paste('ds',i,'samp',j,'=',sum(liss[[i]][[j]]<0)))
}
}
With this one I get the information but I'm wondering about two issues (1 & 2) and some other questions (3 & 4):
I get a list of 10 elements, 6 empty ones and then 4 with sublists. I can't seem to find out how to work with elements 1:4 of the list (sd's) with the 6:9 names (the very sd's).
Even though I tried, I couldn't get to name the lists elements through the 'for' loops. Any insight on these issues would be great.
Since in this context of simulations. What do you think is better: nested lists (lists with sublists) or simple (longer) lists?
I wondered whether the 'apply' functions would be of any help here, I tried to do something, like:
vbv<-matrix(c(6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9))
lsl<-apply(vbv, 2, function(x) rnorm(500,20,x))
But it looks I'm not getting even close....
The problem is in your indexes: you are running over indexer i from ddss, which runs from 6 to 10. So in the first tour of duty in your outer loop, your first statement really says: liss[[6]]<-list(), implying that the first 5 ones are NULL.
So if you insist on working with loops, this is what you should do (check ?seq_along):
ddss<-c(6:10) # sd's to be calculated.
sam<-c(1:4) # 4 samples for each sd
liss<-list() # initializing the list
for (i in seq_along(ddss)) { # now, i runs from 1 to 5
liss[[i]] <- list()
for (j in sam) { # for each sample
liss[[i]][[j]] <- rnorm(500, 20, i)
print(paste('ds',ddss[i],'samp',j,'=',sum(liss[[i]][[j]]<0)))
}
names(liss[[i]])<-as.character(sam)#this should solve your naming issue (1/2)
}
names(liss)<-as.character(ddss)#this should solve your naming issue (2/2)
Note that, as always, it is a good idea to name your variables something more useful than i or j: if you'd named it curds, maybe you wouldn't have used it immediately as an indexer in a list?
Now, if you are really aiming for improvement (but want to stick to lists), you indeed want to go with the apply style functions:
liss<-lapply(ddss, function(curds){ #apply the inline function to each ds and store results in a list
return(lapply(sam, function(cursam){ #apply inline function to each sam and store results in a list
rv<-rnorm(500, 20, curds)
cat('ds',curds,'samp',cursam,'=',sum(rv<0), "\n") #maybe better for your purposes.
return(rv)
}))
})
Finally, for your case, there is not a lot of reason to actually use lists (nor do you even need to keep the sampled data for each ds/sam): you can store everything as a threedimensional array, but since you specify it as a learning exercise (hey, maybe the array thing can be your next exercise :-)), I'll leave it at that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With