Using Summary function inside Data.table

Tags:

r

data.table

I am learning data.table using examples and I am stuck-up with my own scenario.

I am using cars dataset and converted to a data.table for trying my commands.

library(data.table)
> cars.dt=data.table(cars)
> cars.dt[1:5]
   speed dist
1:     4    2
2:     4   10
3:     7    4
4:     7   22
5:     8   16
.
.

I wanted to calculate the summary statistics for each group of speed and store it in different columns but the values are stored in multiple rows.

e.g

 > cars.dt[, summary(dist), by="speed"]
      speed V1
   1:     4  2
   2:     4  4
   3:     4  6
   4:     4  6
   5:     4  8
  ---         
 110:    25 85
 111:    25 85
 112:    25 85
 113:    25 85
 114:    25 85

I was expecting the below output and I am unable to achieve it.

    speed   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 1:     4      2       4       6       6       8      10 
 2:     7    4.0     8.5    13.0    13.0    17.5    22.0 
 3:     8     16      16      16      16      16      16 
 4:     9     10      10      10      10      10      10 
 5:    10     18      22      26      26      30      34 
 6:    11  17.00   19.75   22.50   22.50   25.25   28.00 
 7:    12   14.0    18.5    22.0    21.5    25.0    28.0 
 8:    13     26      32      34      35      37      46 
 9:    14   26.0    33.5    48.0    50.5    65.0    80.0 
10:    15  20.00   23.00   26.00   33.33   40.00   54.00 
11:    16     32      34      36      36      38      40 
12:    17  32.00   36.00   40.00   40.67   45.00   50.00 
13:    18   42.0    52.5    66.0    64.5    78.0    84.0 
14:    19     36      41      46      50      57      68 
15:    20   32.0    48.0    52.0    50.4    56.0    64.0 
16:    22     66      66      66      66      66      66 
17:    23     54      54      54      54      54      54 
18:    24  70.00   86.50   92.50   93.75   99.75  120.00 
19:    25     85      85      85      85      85      85

I tried the below command but the output was not in a data.table

> cars.dt[, print(summary(dist)), by="speed"] 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       4       6       6       8      10 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    4.0     8.5    13.0    13.0    17.5    22.0 
...
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  70.00   86.50   92.50   93.75   99.75  120.00 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     85      85      85      85      85      85 
 Empty data.table (0 rows) of 1 col: speed

I am unable to use functions returning multiple values when using by clause.

If anyone has any idea as to how to write this, it would be much appreciated.

Also let me know if this possible in data.table

640

asked Sep 19 '14 07:09

Manuel

1 Answers

Try:

 dt1 <- cars.dt[, as.list(summary(dist)), by="speed"]
 head(dt1)
 #    speed Min. 1st Qu. Median Mean 3rd Qu. Max.
 #1:     4    2    4.00    6.0  6.0    8.00   10
 #2:     7    4    8.50   13.0 13.0   17.50   22
 #3:     8   16   16.00   16.0 16.0   16.00   16
 #4:     9   10   10.00   10.0 10.0   10.00   10
 #5:    10   18   22.00   26.0 26.0   30.00   34
 #6:    11   17   19.75   22.5 22.5   25.25   28

You could also consider summaryBy from doBy to have some control over the summary functions to output.

 library(doBy)
 dt2 <- summaryBy(.~speed, cars.dt, FUN=c(min, median, mean, max))
 head(dt2,2)
 #   speed dist.min dist.median dist.mean dist.max
 #1:     4        2           6         6       10
 #2:     7        4          13        13       22

I guess the difference in as.list and list argument is:

Without the grouping variable

 list(summary(cars.dt$speed))  #this gets a `list` with one `list element`
 #[[1]]
 # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 4.0    12.0    15.0    15.4    19.0    25.0 

as.list(summary(cars.dt$speed)) #whereas this is also a list with multiple elements
# $Min.
#[1] 4

#$`1st Qu.`
#[1] 12

 #$Median
 #[1] 15

#$Mean
#[1] 15.4

#$`3rd Qu.`
#[1] 19

#$Max.
#[1] 25

same as list(1:5) and as.list(1:5)

100

answered Oct 15 '22 08:10

akrun

Related questions
                            
                                How do I make my facets perfectly square?
                            
                                Parallelization in R: how to "source" on every node?
                            
                                How do I get a data.frame from R's aggregate function in the right format?
                            
                                how to scrape this squawka page?
                            
                                Build difference between groups with dplyr in r
                            
                                Formula evaluation with mutate()
                            
                                How to get `mtext()` with partial bold text?
                            
                                R: interactive plots (tooltips): rCharts dimple plot: formatting axis
                            
                                R - Subtracting two smoothScatter plots
                            
                                Use Predict on data.table with Linear Regression
                            
                                Using compiler- package and suppress "No visible binding for global variable"
                            
                                Rstudio knit to PDF
                            
                                Convert a printed message into a character vector
                            
                                dplyr, do(), extracting parameters from model without losing grouping variable
                            
                                parRF on caret not working for more than one core
                            
                                How to use tryCatch in R
                            
                                Splitting knitr Chunk code and output into two different knitrouts
                            
                                Split column name and convert data from wide to long format in R
                            
                                Plotting large number of time series using ggplot. Is it possible to speed up?
                            
                                rPython using wrong python installation on Mac OSX

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With