Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sapply - retain column names

I am trying to summarise the mean, sd etc for a number of different columns (variables) in my dataset. I have coded my own summarise function to return exactly what I need and am using sapply to apply this function to all the variables at once. It works fine, however the dataframe that is returned has no column names and I cannot seem to even rename them using a column number reference - aka they seem impossible to use in any way.

My code is below- as I am just finding summary statistics, I would like to just keen the same column (variable) names, with 4 rows (mean, sd, min, max). Is there any way at all to do this (even a slow way where I manually change the names of the columns)

 #GENERATING DESCRIPTIVE STATISTICS
sfsum= function(x){
  mean=mean(x)
  sd=sd(x)
  min=min(x)
  max=max(x)

  return(c(mean,sd,min,max))
}

#
c= list(sfbalanced$age_child, sfbalanced$earnings_child, 
sfbalanced$logchildinc ,sfbalanced$p_inc84, sfbalanced$login84, 
sfbalanced$p_inc85, sfbalanced$login85, sfbalanced$p_inc86, 
sfbalanced$login86, sfbalanced$p_inc87, sfbalanced$login87, 
sfbalanced$p_inc88, sfbalanced$login88)

summ=sapply(c,sfsum)

names(summ)
 NULL
like image 601
Morag McDonald Avatar asked May 22 '18 15:05

Morag McDonald


2 Answers

If you provide names in return during the function definition, you can have rownames as function names, if you provide names of lists while defining your object then you can use USE.NAMES in sapply to get the names automatically.

An example on mtcars data can give you following output.

Code

sfsum= function(x){
    mean=mean(x)
    sd=sd(x)
    min=min(x)
    max=max(x)

    return(c("mean"=mean,"sd"=sd,"min" = min,"max" =max)) #For rownames
}

#
x= list("mpg" = mtcars$mpg, "disp" = mtcars$disp, "drat" = mtcars$drat)
#For column names

summ=sapply(x,sfsum, USE.NAMES = TRUE) #USE.NAMES = TRUE to get names on top

Output:

> summ
           mpg     disp      drat
mean 20.090625 230.7219 3.5965625
sd    6.026948 123.9387 0.5346787
min  10.400000  71.1000 2.7600000
max  33.900000 472.0000 4.9300000
like image 134
PKumar Avatar answered Nov 03 '22 06:11

PKumar


If we need to have the column names as well, just loop through the dataset (assuming that we are applying the function on all the columns)

out <- sapply(df2, sfsum)
row.names(out) <- c('mean', 'sd', 'min', 'max')

data

set.seed(24)
df2 <- as.data.frame(matrix(rnorm(4*4), 4, 4))
like image 24
akrun Avatar answered Nov 03 '22 07:11

akrun