My guess is that this is easy using ddply but Im still a newbie at R and can't get my head around it.
I have a data.frame looking like this
txt <- "label var1 var2 var3 var4 var5 var6 var7
lab1 401 80 57 125 118 182 83
lab2 72 192 80 224 182 187 178
lab3 7 152 134 104 105 80 130
lab4 3 58 210 30 78 33 87
lab5 1 2 3 1 1 2 6"
mydata <- read.table(textConnection(txt), sep = " ", header = TRUE)
doing this I can transform one variable at a time into percentage
mydata$var1 <- round(prop.table(mydata$var1),3)*100
But how to do it with all variables (var1:var7) in a data.frame in one stroke?
NOTE: It is going into a function, in which length and number of variables differs from time to time, and hence the code should be sensitive to this.
Thank you in advance
Just coerce to a matrix
and use the margin argument to prop.table
like so:
round( prop.table(as.matrix(df),2) * 100 , 3 )
For example
set.seed(123)
df <- data.frame( matrix( sample(4 , 12 , repl=TRUE ) , 3 ) )
df
# X1 X2 X3 X4
#1 2 4 3 2
#2 4 4 4 4
#3 2 1 3 2
round( prop.table(as.matrix(df),2) * 100 , 3 )
# X1 X2 X3 X4
#[1,] 25 44.444 30 25
#[2,] 50 44.444 40 50
#[3,] 25 11.111 30 25
In your example it looks like what I thought were rownames is actually a column of character values. To use prop.table
on all columns except this first one you can do prop.table( df[,-1] , margin = 2 )
.
No need for fancy packages. This will work as long as you want to do it to all but the first column. You could adapt the conditions for what columns are included if 2:ncol
isn't appropriate.
t(round(t(mydata[, 2:ncol(mydata)]) / colSums(mydata[, 2:ncol(mydata)]) * 100, 3))
And, since you asked about plyr
and dplyr
is the improved version of ddply
, here's how you'd do it with that:
require(dplyr)
require(reshape2)
mydata %>% melt(id.vars = "label") %>%
group_by(variable) %>%
mutate(prop = round(value / sum(value) * 100, 3)) %>%
dplyr::select(-value) %>%
dcast(label ~ variable, fun.aggregate = sum, value.var = "prop")
Convert your data to long format, calculate the proportions, and switch it back to wide. A lot of typing for what Simon O'Hanlon shows to be a quick one-liner, but the dplyr
method generalizes nicely to whatever sorts of calculations you might want to do.
Maybe something like this can help you:
cbind(label=mydat[,1],as.data.frame(apply(mydat[,-1], 2, function(col) round(prop.table(col),3)*100 )))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With