I'm trying to normalize some data which I have in a data frame. I want to take each value and run it through the pnorm function along with the mean and standard deviation of the column the value lives in. Using loops, here's how I would write out what I want to do:
#example data
hist_data <- data.frame( matrix( rnorm( 200,mean=5,sd=.5 ),nrow=20 ) )
n <- dim( hist_data )[2] #columns=10
k <- dim( hist_data )[1] #rows =20
#set up the data frame which we will populate with a loop
normalized <- data.frame( matrix( nrow = nrow( hist_data ), ncol = ncol( hist_data ) ) )
#hot loop in loop action
for ( i in 1:n ){
for ( j in 1:k ){
normalized[j,i] <- pnorm( hist_data[j,i],
mean = mean( hist_data[,i] ),
sd = sd( hist_data[,i] ) )
}
}
normalized
It seems that in R there should be a handy dandy vector way of doing this. I thought I was smart so tried using the apply function:
#trouble ahead
hist_data <- data.frame( matrix( rnorm( 200, mean = 5,sd = .5 ), nrow=10 ) )
normalized <- apply( hist_data, 2, pnorm, mean = mean( hist_data ), sd = sd( hist_data ) )
normalized
Much to my chagrin, that does NOT produce what I expected. The upper left and bottom right elements of the output are correct, but that's it. So how can I de-loopify my life?
Bonus points if you can tell me what my second code block is actually doing. Kind of a mystery to me still. :)
You want:
normalize <- apply(hist_data, 2, function(x) pnorm(x, mean=mean(x), sd=sd(x)))
The problem is that you're passing in the individual column into pnorm
, but the entire hist_data
into both the mean & the sd.
As I mentioned on twitter, I'm no stats guy so I can't answer anything about what you're actually trying to do :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With