I am trying to summarise the mean, sd etc for a number of different columns (variables) in my dataset. I have coded my own summarise function to return exactly what I need and am using sapply to apply this function to all the variables at once. It works fine, however the dataframe that is returned has no column names and I cannot seem to even rename them using a column number reference - aka they seem impossible to use in any way.
My code is below- as I am just finding summary statistics, I would like to just keen the same column (variable) names, with 4 rows (mean, sd, min, max). Is there any way at all to do this (even a slow way where I manually change the names of the columns)
#GENERATING DESCRIPTIVE STATISTICS
sfsum= function(x){
mean=mean(x)
sd=sd(x)
min=min(x)
max=max(x)
return(c(mean,sd,min,max))
}
#
c= list(sfbalanced$age_child, sfbalanced$earnings_child,
sfbalanced$logchildinc ,sfbalanced$p_inc84, sfbalanced$login84,
sfbalanced$p_inc85, sfbalanced$login85, sfbalanced$p_inc86,
sfbalanced$login86, sfbalanced$p_inc87, sfbalanced$login87,
sfbalanced$p_inc88, sfbalanced$login88)
summ=sapply(c,sfsum)
names(summ)
NULL
If you provide names in return during the function definition, you can have rownames as function names, if you provide names of lists while defining your object then you can use USE.NAMES
in sapply
to get the names automatically.
An example on mtcars
data can give you following output.
Code
sfsum= function(x){
mean=mean(x)
sd=sd(x)
min=min(x)
max=max(x)
return(c("mean"=mean,"sd"=sd,"min" = min,"max" =max)) #For rownames
}
#
x= list("mpg" = mtcars$mpg, "disp" = mtcars$disp, "drat" = mtcars$drat)
#For column names
summ=sapply(x,sfsum, USE.NAMES = TRUE) #USE.NAMES = TRUE to get names on top
Output:
> summ
mpg disp drat
mean 20.090625 230.7219 3.5965625
sd 6.026948 123.9387 0.5346787
min 10.400000 71.1000 2.7600000
max 33.900000 472.0000 4.9300000
If we need to have the column names as well, just loop through the dataset (assuming that we are applying the function on all the columns)
out <- sapply(df2, sfsum)
row.names(out) <- c('mean', 'sd', 'min', 'max')
set.seed(24)
df2 <- as.data.frame(matrix(rnorm(4*4), 4, 4))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With