While analysing some data, I came across the warning message, which I suspect to be a bug as it is a pretty straightforward command that I have worked with many times.
Warning message:
In rbindlist(allargs) : NAs introduced by coercion
I was able to reproduce the error. Here's a code with which you should be able to reproduce the error.
# unique random names for column V1
set.seed(45)
n <- sapply(1:500, function(x) {
paste(sample(c(letters[1:26]), 10), collapse="")
})
# generate some values for V2 and V3
dt <- data.table(V1 = sample(n, 30*500, replace = TRUE),
V2 = sample(1:10, 30*500, replace = TRUE),
V3 = sample(50:100, 30*500, replace = TRUE))
setkey(dt, "V1")
# No warning when providing column names (and right results)
dt[, list(s = sum(V2), m = mean(V3)),by=V1]
# V1 s m
# 1: acgmqyuwpe 238 74.97778
# 2: adcltygwsq 204 79.94118
# 3: adftozibnh 165 75.51515
# 4: aeuowtlskr 164 75.70968
# 5: ahfoqclkpg 192 73.20000
# ---
# 496: zuqegoxkpi 93 77.95000
# 497: zwpserimgf 178 72.62963
# 498: zxkpdrlcsf 154 78.04167
# 499: zxvoaeflhq 121 75.34615
# 500: zyiwcsanlm 180 76.61290
# Warning message and results with NA
dt[, list(sum(V2), mean(V3)),by=V1]
# V1 V1 V2
# 1: acgmqyuwpe 238 74.97778
# 2: adcltygwsq 204 79.94118
# 3: adftozibnh 165 75.51515
# 4: aeuowtlskr 164 75.70968
# 5: ahfoqclkpg 192 73.20000
# ---
# 496: zuqegoxkpi NA 77.95000
# 497: zwpserimgf NA 72.62963
# 498: zxkpdrlcsf NA 78.04167
# 499: zxvoaeflhq NA 75.34615
# 500: zyiwcsanlm NA 76.61290
Warning message:
In rbindlist(allargs) : NAs introduced by coercion
1) It seems that this happens if you don't provide the column names.
2) Even then, in particular, when V1
(or the column you use in by=
) has a lot of unique
entries (500 here) and you don't specify column names, then this seems to happen. That is, this DOES NOT happen when the by=
column V1
has fewer unique entries. For example, try changing the code for n
from sapply(1:500, ...
to sapply(1:50, ...
and you'll get no warning.
What's going on here? Its R version 2.15 on Macbook pro with OS X 10.8.2 (although I tested it on another macbook pro with 2.15.2). Here's the sessionInfo()
.
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.6 reshape2_1.2.2
loaded via a namespace (and not attached):
[1] plyr_1.8 stringr_0.6.2 tools_2.15.0
Just reproduced with 2.15.2
:
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.6
Approach 2: Using the suppressWarnings() function to disable a warning message. You may not always wish to convert non-number values to numbers. In this scenario, just wrap the suppress warnings function around the as. numeric function to disregard the warning message “NAs introduced by coercion”.
As you can see, the warning message “NAs introduced by coercion” is returned and some output values are NA (i.e. missing data or not available data). The reason for this is that some of the character strings are not properly formatted numbers and hence cannot be converted to the numeric class.
UPDATE : Now fixed in v1.8.9 by Ricardo
o rbind'ing data.tables containing duplicate, "" or NA column names now works, #2726 & #2384. Thanks to Garrett See and Arun Srinivasan for reporting. This also affected the printing of data.tables with duplicate column names since the head and tail are rbind-ed together internally.
Yes, bug. Seems to be in the print method of data.table
s with duplicated names.
ans = dt[, list(sum(V2), mean(V3)),by=V1]
head(ans)
V1 V1 V2 # notice the duplicated V1
1: acgmqyuwpe 140 78.07692
2: adcltygwsq 191 76.93333
3: adftozibnh 153 73.82143
4: aeuowtlskr 122 73.04348
5: ahfoqclkpg 143 75.83333
6: ahtczyuipw 135 73.54167
tail(ans)
V1 V1 V2
1: zugrnehpmq 189 72.63889
2: zuqegoxkpi 150 76.03333
3: zwpserimgf 180 74.81818
4: zxkpdrlcsf 115 72.57895
5: zxvoaeflhq 157 76.53571
6: zyiwcsanlm 145 72.79167
print(ans)
Error in rbindlist(allargs) :
(converted from warning) NAs introduced by coercion
rbind(head(ans),tail(ans))
Error in rbindlist(allargs) :
(converted from warning) NAs introduced by coercion
As a work around, don't create data.table with column names V1
, V2
etc.
It's arising due to this known bug :
#2384 rbind of tables containing duplicate column names doesn't bind correctly
and I've added a link there back to this question.
Thanks!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With