data.table and stratified means

Tags:

I've got some code that generate stratified weighted means and I'm certain this worked a few months ago. But, but I'm not sure what the current problem is. (I apologize - this must be very basic stuff):

Click to copy

dp=
structure(list(seqn = c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 3L, 4L, 9L, 10L, 11L, 14L, 8L, 11L, 12L, 10L, 
5L, 13L, 2L, 14L, 3L, 9L, 6L, 7L), sex = c(2L, 1L, 2L, 2L, 1L, 
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), bmi = c(22.8935608711259, 
27.0944623781918, 40.4637162938634, 23.7649712675423, 15.3193372705538, 
31.1280302540991, 21.4866354393239, 20.3200254374398, 32.331092513536, 
25.3679771839413, 33.9400508162971, 14.7048592172926, 25.5243757788688, 
23.4331882363495, 27.6428134168995, 29.3923629426172, 24.9547209666314, 
17.0522203606383, 15.51, 22, 30.62, 30.94, 29.1, 25.57, 24.9, 
27.33, 17.63, 18.48, 22.56, 29.39), tc = c(273L, 181L, 150L, 
201L, 142L, 165L, 235L, 219L, 298L, 222L, 143L, 134L, 268L, 160L, 
236L, 225L, 260L, 140L, 162L, 132L, 156L, 140L, 279L, 314L, 215L, 
174L, 129L, 148L, 153L, 245L), swt = c(1645, 3318, 2280, 1574, 
4062, 1627, 14604, 24675, 975, 975, 2697, 1559, 1737.58, 1730.23, 
19521.36, 28080.57, 1248.43, 13745.77, 5251.76464426326, 6497.194885522, 
15915.7023420765, 3740.96809540218, 16574.177622509, 307.32513798849, 
4720.89748295751, 3247.78896499604, 7698.70949077031, 1262.6450411464, 
6609.43340735515, 4254.23723479882)), .Names = c("seqn", "sex", 
"bmi", "tc", "swt"), row.names = c(20560L, 20561L, 20562L, 20563L, 
20565L, 20566L, 20567L, 20568L, 20569L, 20570L, 20571L, 20572L, 
61335L, 61336L, 61338L, 61339L, 61340L, 61341L, 95465L, 96890L, 
104613L, 105988L, 107581L, 112267L, 113403L, 114292L, 119979L, 
120271L, 125939L, 135699L), class = "data.frame")

dt=data.table(dp, key='sex')

sapply(df,function(x)weighted.mean(x,df$swt))  #this works to weighted mean
dt[,lapply(.SD, mean, na.rm=T), .SDcols=c('bmi','tc','swt')]  
     #this also works for overall unweighted mean

dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt), .SDcols=c('bmi','tc','swt')]

but this gives the error: Error in weighted.mean.default(x, swt, na.rm = TRUE) : object 'swt' not found

Click to copy

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.6

loaded via a namespace (and not attached):
[1] tools_2.15.2

886

asked Nov 18 '12 16:11

David F

1 Answers

UPDATE (from Arun): This is now fixed in v1.8.11. From NEWS:

o DT[, lapply(.SD, function(), by=] did not see columns of DT when optimisation is "on". This is now fixed, #2381. Tests added and tested successfully. Thanks to David F for reporting on SO: data.table and stratified means

This is indeed a bug introduced somewhere between 1.8.2 and 1.8.6.

Click to copy

dt[,lapply(.SD, function(x) weighted.mean(x,swt, na.rm=TRUE)), by=key(dt),
    .SDcols=c('bmi','tc','swt')] 
Error in weighted.mean.default(x, swt, na.rm = TRUE) : 
    object 'swt' not found

To work around this in the meantime, either turn off optimization :

Click to copy

options(datatable.optimize=FALSE)
dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt),    
    .SDcols=c('bmi','tc','swt')]
   sex      bmi       tc      swt
1:   1 25.64376 206.0115 17171.20
2:   2 23.73566 193.8727 11467.47

or, don't wrap with function() :

Click to copy

options(datatable.optimize=TRUE)
dt[,lapply(.SD, weighted.mean, swt, na.rm=TRUE), by=key(dt),    
    .SDcols=c('bmi','tc','swt')] 
   sex      bmi       tc      swt
1:   1 25.64376 206.0115 17171.20
2:   2 23.73566 193.8727 11467.47

We are making more use of optimization now, but this case slipped through the test suite: tests 825.1, 825.2 and 825.3 didn't cover an argument to a function being another column, within an anonymous function(). It would be a problem where the function isn't already given; i.e., unlike this case, where the function() can just be omitted since weighted.mean is already given and can be applied as-is.

You can see how optimization modifies j by setting verbose=TRUE (either per query or with the global option). In this case nothing would have been revealed as wrong by that verbose output, but just mentioning it as an aside.

Now filed as #2381: Optimization of lapply(.SD, function() ...) no longer sees columns inside .... Will fix and add tests so this can't regress again.

Thanks!

183

answered Oct 13 '22 00:10

Matt Dowle

Related questions
                            
                                Error Getting the EUR.USD Historical data using R on Ibrokers
                            
                                Running advanced MongoDB queries in R with rmongodb
                            
                                R glmnet family = binomial predict values outside of 0-1
                            
                                Convert map data to data frame using fortify {ggplot2} for spatial objects in R
                            
                                How to save a function as new R script?
                            
                                Filtering a data frame by factors in R
                            
                                Creating seasonally adjusted data using ARIMA in R
                            
                                Merge 2 data frame based on 2 columns with different column names
                            
                                Errors in segmented package: breakpoints confusion
                            
                                Iterate through rows of list in R
                            
                                Add title to meta analysis forest plot
                            
                                Turning a data.frame into a single row
                            
                                Column of static mean for n rows
                            
                                r language mtext not working with image.plot array
                            
                                Sample Covariance for 2 vectors
                            
                                plotting multiple xts objects in one window
                            
                                Pull nth Day of Month in XTS in R
                            
                                New method for plot - how to export?
                            
                                R studio shiny conditional statements
                            
                                Slice a string at consecutive indices with R / Rcpp?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

data.table and stratified means

Tags:

r

data.table

David F

People also ask

1 Answers

UPDATE (from Arun): This is now fixed in v1.8.11. From NEWS:

Matt Dowle

Recent Activity

Donate For Us