R 3.1.0 is out and one of the new features is the following:
type.convert()
(and hence by defaultread.table()
) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.
To give an example:
df <- read.table(text = "num1 num2
1.1 1.1234567890123456
2.2 2.2
3.3 3.3", header = TRUE)
sapply(df, class)
# num1 num2
# "numeric" "factor"
while with previous versions, read.table
would have returned two numeric columns.
For those who like me are a concerned about that change, what can be done to preserve the old behavior?
Note: I'd like a general solution that does not make assumptions on the input data, i.e. do not suggest I use colClasses = "numeric"
in the example above. Thanks.
In version 3.1.1, there is this change listed in the News file:
type.convert()
,read.table()
and similarread.*()
functions get a newnumerals
argument, specifying how numeric input is converted when its conversion to double precision loses accuracy. The defaultnumerals = "allow.loss"
allows accuracy loss, as in R versions before 3.1.0.
Much of post-release discussion about the original change, including the decisions to revert the default behavior with an additional warning, can be found in a thread on the developers' email list.
For version 3.1.0, code will have to be modified to get the old behavior. Switching to 3.1.1 is another strategy.
The mention of this change for version 3.1.0 (from the same News file) says
type.convert()
(and hence by defaultread.table()
) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify
colClasses
inread.table()
to be"numeric"
.
Note: original answer was written when the applicable version with the fix was 3.1.0 patched. The answer has been updated now that 3.1.1 has been released.
Try using data.table's fread
:
# create test data set "a.dat"
Lines <- "num1 num2\n1.1 1.1234567890123456\n2.2 2.2\n3.3 3.3\n"
cat(Lines, file = "a.dat")
#####
library(data.table)
DT <- fread("a.dat")
str(DT)
## Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
## $ num1: num 1.1 2.2 3.3
## $ num2: num 1.12 2.2 3.3
## - attr(*, ".internal.selfref")=<externalptr>
class(DT)
## [1] "data.table" "data.frame"
DF <- as.data.frame(DT)
class(DF)
## [1] "data.frame"
ADDED LATER Since this answer was posted the latest patched version of R 3.1.0 has come out and by default reverts back to the old behavior with a new numerals
argument to specify it differently. See type.convert and read.table
Since I don't have rep to comment on Brian Diggs's response - for future reference, the new argument is now called "numerals" (not "exact"). From http://cran.r-project.org/bin/windows/base/NEWS.R-3.1.0patched.html:
type.convert(), read.table() and similar read.*() functions get a new numerals argument, specifying how numeric input is converted when its conversion to double precision loses accuracy. The default numerals = "allow.loss" allows accuracy loss, as in R versions before 3.1.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With