I am learning data.table
using examples and I am stuck-up with my own scenario.
I am using cars
dataset and converted to a data.table
for trying my commands.
library(data.table)
> cars.dt=data.table(cars)
> cars.dt[1:5]
speed dist
1: 4 2
2: 4 10
3: 7 4
4: 7 22
5: 8 16
.
.
I wanted to calculate the summary statistics for each group of speed
and store it in different columns but the values are stored in multiple rows.
e.g
> cars.dt[, summary(dist), by="speed"]
speed V1
1: 4 2
2: 4 4
3: 4 6
4: 4 6
5: 4 8
---
110: 25 85
111: 25 85
112: 25 85
113: 25 85
114: 25 85
I was expecting the below output and I am unable to achieve it.
speed Min. 1st Qu. Median Mean 3rd Qu. Max.
1: 4 2 4 6 6 8 10
2: 7 4.0 8.5 13.0 13.0 17.5 22.0
3: 8 16 16 16 16 16 16
4: 9 10 10 10 10 10 10
5: 10 18 22 26 26 30 34
6: 11 17.00 19.75 22.50 22.50 25.25 28.00
7: 12 14.0 18.5 22.0 21.5 25.0 28.0
8: 13 26 32 34 35 37 46
9: 14 26.0 33.5 48.0 50.5 65.0 80.0
10: 15 20.00 23.00 26.00 33.33 40.00 54.00
11: 16 32 34 36 36 38 40
12: 17 32.00 36.00 40.00 40.67 45.00 50.00
13: 18 42.0 52.5 66.0 64.5 78.0 84.0
14: 19 36 41 46 50 57 68
15: 20 32.0 48.0 52.0 50.4 56.0 64.0
16: 22 66 66 66 66 66 66
17: 23 54 54 54 54 54 54
18: 24 70.00 86.50 92.50 93.75 99.75 120.00
19: 25 85 85 85 85 85 85
I tried the below command but the output was not in a data.table
> cars.dt[, print(summary(dist)), by="speed"]
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 4 6 6 8 10
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.0 8.5 13.0 13.0 17.5 22.0
...
Min. 1st Qu. Median Mean 3rd Qu. Max.
70.00 86.50 92.50 93.75 99.75 120.00
Min. 1st Qu. Median Mean 3rd Qu. Max.
85 85 85 85 85 85
Empty data.table (0 rows) of 1 col: speed
I am unable to use functions returning multiple values when using by
clause.
If anyone has any idea as to how to write this, it would be much appreciated.
Also let me know if this possible in data.table
A summary table is a new spreadsheet that instead of having all of the data, has new data that has statistics computed from the original data. See the Data Statistics Chapter of the wikibook for a discussion of some of the data statistics that you can use in summary tables.
The summary may also be helpful for simple data tables that contain many columns or rows of data. The summary attribute may be used whether or not the table includes a caption element. If both are used, the summary should not duplicate the caption. The summary attribute on the table element is obsolete.
Try:
dt1 <- cars.dt[, as.list(summary(dist)), by="speed"]
head(dt1)
# speed Min. 1st Qu. Median Mean 3rd Qu. Max.
#1: 4 2 4.00 6.0 6.0 8.00 10
#2: 7 4 8.50 13.0 13.0 17.50 22
#3: 8 16 16.00 16.0 16.0 16.00 16
#4: 9 10 10.00 10.0 10.0 10.00 10
#5: 10 18 22.00 26.0 26.0 30.00 34
#6: 11 17 19.75 22.5 22.5 25.25 28
You could also consider summaryBy
from doBy
to have some control over the summary functions to output.
library(doBy)
dt2 <- summaryBy(.~speed, cars.dt, FUN=c(min, median, mean, max))
head(dt2,2)
# speed dist.min dist.median dist.mean dist.max
#1: 4 2 6 6 10
#2: 7 4 13 13 22
I guess the difference in as.list
and list
argument is:
Without the grouping variable
list(summary(cars.dt$speed)) #this gets a `list` with one `list element`
#[[1]]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.0 12.0 15.0 15.4 19.0 25.0
as.list(summary(cars.dt$speed)) #whereas this is also a list with multiple elements
# $Min.
#[1] 4
#$`1st Qu.`
#[1] 12
#$Median
#[1] 15
#$Mean
#[1] 15.4
#$`3rd Qu.`
#[1] 19
#$Max.
#[1] 25
same as list(1:5)
and as.list(1:5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With