I have a data table with 10 columns.
town
tc
one
two
three
four
five
six
seven
total
Need to generate mean for columns "one" to "total" for which I am using,
DTmean <- DT[,(lapply(.SD,mean)),by = .(town,tc),.SDcols=3:10]
This generates the mean, but then I want the column names to be suffixed with "_mean". How can we do this? Want the first two columns to remain the same as "town" and "tc". I tried the below but then it renames all "one" to "total" to just "_mean"
for (i in 3:10) {
setnames(DTmean,i,paste0(names(i),"_mean"))
}
If you want to do it the data.table
way, you should use setnames
as follows:
setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))
or:
cols <- names(DT)[3:10]
setnames(DTmean, cols, paste0(cols, '_mean'))
Furthermore, you don't need the .SDcols
statement as you are aggregating all the other columns. Using DT[, lapply(.SD,mean), by = .(town,tc)]
should thus give you the same result as using DT[, (lapply(.SD,mean)), by = .(town,tc), .SDcols=3:10]
.
On the following example dataset:
set.seed(71)
DT <- data.table(town = rep(c('A','B'), each=10),
tc = rep(c('C','D'), 10),
one = rnorm(20,1,1),
two = rnorm(20,2,1),
three = rnorm(20,3,1),
four = rnorm(20,4,1),
five = rnorm(20,5,2),
six = rnorm(20,6,2),
seven = rnorm(20,7,2),
total = rnorm(20,28,3))
using:
DTmean <- DT[, lapply(.SD,mean), by = .(town,tc)]
setnames(DTmean, 3:10, paste0(names(DT)[3:10], '_mean'))
gives:
> DTmean
town tc one_mean two_mean three_mean four_mean five_mean six_mean seven_mean total_mean
1: A C 1.7368898 1.883586 3.358440 4.849896 4.742609 5.089877 6.792513 29.20286
2: A D 0.8906842 1.826135 3.267684 3.760931 6.210145 7.320693 5.571687 26.56142
3: B C 1.4037955 2.474836 2.587920 3.719658 3.446612 6.510183 8.309784 27.80012
4: B D 0.8103511 1.153000 3.360940 3.945082 5.555999 6.198380 8.652779 28.95180
In reply to your comment: If you want to calculate both the mean and the sd
simultanuously, you could do (adapted from my answer here):
DT[, as.list(unlist(lapply(.SD, function(x) list(mean = mean(x), sd = sd(x))))), by = .(town,tc)]
which gives:
town tc one.mean one.sd two.mean two.sd three.mean three.sd four.mean four.sd five.mean five.sd six.mean six.sd seven.mean seven.sd total.mean total.sd
1: A C 0.2981842 0.3556520 1.578174 0.7788545 2.232366 0.9047046 4.896201 1.238877 4.625866 0.7436584 7.607439 1.7262628 7.949366 1.772771 28.94287 3.902602
2: A D 1.2099018 1.0205252 1.686068 1.5497989 2.671027 0.8323733 4.811279 1.404794 7.235969 0.7883873 6.765797 2.7719942 6.657298 1.107843 27.42563 3.380785
3: B C 0.9238309 0.6679821 2.525485 0.8054734 3.138298 1.0111270 3.876207 0.573342 3.843140 2.1991052 4.942155 0.7784024 6.783383 2.595116 28.95243 1.078307
4: B D 0.8843948 0.9384975 1.988908 1.0543981 3.673393 1.3505701 3.957534 1.097837 2.788119 1.9089660 6.463784 0.7642144 6.416487 2.041441 27.88205 3.807119
However, it is highly probable better to store this in long format. To get this you could use data.table
's melt
function as follows:
cols <- names(DT)[3:10]
DT2 <- melt(DT[, as.list(unlist(lapply(.SD, function(x) list(mn = mean(x), sdev = sd(x))))), by = .(town,tc)],
id.vars = c('town','tc'),
measure.vars = patterns('.mn','.sdev'),
value.name = c('mn','sdev'))[, variable := cols[variable]]
or in a much simpler operation:
DT2 <- melt(DT, id.vars = c('town','tc'))[, .(mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)]
which results in:
> DT2
town tc variable mn sdev
1: A C one 0.2981842 0.3556520
2: A D one 1.2099018 1.0205252
3: B C one 0.9238309 0.6679821
4: B D one 0.8843948 0.9384975
5: A C two 1.5781743 0.7788545
6: A D two 1.6860675 1.5497989
7: B C two 2.5254855 0.8054734
8: B D two 1.9889082 1.0543981
9: A C three 2.2323655 0.9047046
10: A D three 2.6710267 0.8323733
11: B C three 3.1382982 1.0111270
12: B D three 3.6733929 1.3505701
.....
In response to your last comments, you can detect outliers as follows:
DT3 <- melt(DT, id.vars = c('town','tc'))
DT3[, `:=` (mn = mean(value), sdev = sd(value)), by = .(town,tc,variable)
][, outlier := +(value < mn - sdev | value > mn + sdev)]
which gives:
town tc variable value mn sdev outlier
1: A C one 0.5681578 0.2981842 0.355652 0
2: A D one 0.5528128 1.2099018 1.020525 0
3: A C one 0.5214274 0.2981842 0.355652 0
4: A D one 1.4171454 1.2099018 1.020525 0
5: A C one 0.5820994 0.2981842 0.355652 0
---
156: B D total 23.4462542 27.8820524 3.807119 1
157: B C total 30.5934956 28.9524305 1.078307 1
158: B D total 30.5618759 27.8820524 3.807119 0
159: B C total 27.5940307 28.9524305 1.078307 1
160: B D total 24.8378437 27.8820524 3.807119 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With