Calculate summary statistics (e.g. mean) on all numeric columns using data.table

Tags:

3 Answers

By searching on SO for .SDcols, I landed up on this answer, which I think explains quite nicely how to use it.

cols = sapply(mydt, is.numeric)
cols = names(cols)[cols]
mydt[, lapply(.SD, mean), .SDcols = cols]
#        vnum1 vint1
# 1: -0.046491   4.5

Doing mydt[, sapply(mydt, is.numeric), with = FALSE] (note: the "modern" way to do that is mydt[ , .SD, .SDcols = is.numeric])is not that efficient because it subsets your data.table with those columns and that makes a (deep) copy - more memory used unnecessarily.

And using colMeans coerces the data.table into a matrix, which again is not so memory efficient.

179

answered Oct 11 '22 20:10

Arun

You can use the below format in just one line without having to use sapply:

mydt[, lapply(.SD, mean), .SDcols = is.numeric]

Also, if you're working with real data, there's a good chance your data has na values. Here's how it'd work in case of NA's:

mydt[, lapply(.SD, function(i) mean(i, na.rm = T)), .SDcols = is.numeric]

answered Oct 11 '22 21:10

Moein

I got the same problem, also the code below may help too.

data("mtcars")
mtcars$X1 <- factor(mtcars$gear, levels = c(4,3,5)); str(mtcars) #create an non numeric column X1
my.mean <- function(x){ if(is.numeric(x)) c(mean(x), median(x))} 
my.df <- setNames(as.data.frame(unlist(lapply(mtcars, FUN = my.mean))), "values"); my.df
my.df$names <- rep(c("mean","median"), times = length(row.names(my.df))/2); my.df
my.df$variables <-  rownames(my.df); my.df
library(stringr)
my.df$variables <- str_remove(my.df$variables, "[12]"); my.df 

data_wide <- spread(my.df,  names, values)
data_wide

> data_wide
   variables       mean  median
1         am   0.406250   0.000
2       carb   2.812500   2.000
3        cyl   6.187500   6.000
4       disp 230.721875 196.300
5       drat   3.596563   3.695
6       gear   3.687500   4.000
7         hp 146.687500 123.000
8        mpg  20.090625  19.200
9       qsec  17.848750  17.710
10        vs   0.437500   0.000
11        wt   3.217250   3.325

answered Oct 11 '22 21:10

Seyma Kalay

Related questions
                            
                                R: Pearson correlation rcorr(x,y) [x=matrix, y=vector] ignores y
                            
                                Partial vector addition in R
                            
                                Applying a function row-wise for a dataset
                            
                                Count Pattern Matching in R
                            
                                R package Kohonen - how to plot hexagons instead of circles as in Matlab SOM toolbox?
                            
                                cforest prints empty tree
                            
                                Nested lapply() in a list?
                            
                                How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?
                            
                                ggplot2 box-whisker plot: show 95% confidence intervals & remove outliers
                            
                                conditionally remove elements in a vector
                            
                                Cannot use `fill = NA` in cast
                            
                                More than one value for "each" argument in "rep" function?
                            
                                Underline Text in a barplot in R
                            
                                In R, how can I generate a subgraph from a igraph object based on multiple attribute scores?
                            
                                Plot a best fit line R [duplicate]
                            
                                create plots based on radio button selection R Shiny
                            
                                Converting object of class rules to data frame in R
                            
                                How to count occurrences combinations in data.table in R
                            
                                How to plot mean and standard error in Boxplot in R
                            
                                R- sqldf -need explicit units for numeric conversion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculate summary statistics (e.g. mean) on all numeric columns using data.table

Tags:

r

data.table

rnso

People also ask

3 Answers

Arun

Moein

Seyma Kalay

Recent Activity

Donate For Us