Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preserve old (pre 3.1.0) type.convert behavior

Tags:

r

read.table

R 3.1.0 is out and one of the new features is the following:

type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.

To give an example:

df <- read.table(text = "num1 num2
1.1 1.1234567890123456
2.2 2.2
3.3 3.3", header = TRUE)

sapply(df, class)
#      num1      num2 
# "numeric"  "factor"

while with previous versions, read.table would have returned two numeric columns.

For those who like me are a concerned about that change, what can be done to preserve the old behavior?

Note: I'd like a general solution that does not make assumptions on the input data, i.e. do not suggest I use colClasses = "numeric" in the example above. Thanks.

like image 612
flodel Avatar asked Apr 15 '14 01:04

flodel


3 Answers

In version 3.1.1, there is this change listed in the News file:

type.convert(), read.table() and similar read.*() functions get a new numerals argument, specifying how numeric input is converted when its conversion to double precision loses accuracy. The default numerals = "allow.loss" allows accuracy loss, as in R versions before 3.1.0.

Much of post-release discussion about the original change, including the decisions to revert the default behavior with an additional warning, can be found in a thread on the developers' email list.

For version 3.1.0, code will have to be modified to get the old behavior. Switching to 3.1.1 is another strategy.

The mention of this change for version 3.1.0 (from the same News file) says

type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs.

If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be "numeric".

Note: original answer was written when the applicable version with the fix was 3.1.0 patched. The answer has been updated now that 3.1.1 has been released.

like image 87
Brian Diggs Avatar answered Oct 20 '22 02:10

Brian Diggs


Try using data.table's fread:

# create test data set "a.dat"
Lines <- "num1 num2\n1.1 1.1234567890123456\n2.2 2.2\n3.3 3.3\n"
cat(Lines, file = "a.dat")

#####

library(data.table)

DT <- fread("a.dat")
str(DT)
## Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
## $ num1: num  1.1 2.2 3.3
## $ num2: num  1.12 2.2 3.3
## - attr(*, ".internal.selfref")=<externalptr> 

class(DT)
## [1] "data.table" "data.frame"

DF <- as.data.frame(DT) 
class(DF)
## [1] "data.frame"

ADDED LATER Since this answer was posted the latest patched version of R 3.1.0 has come out and by default reverts back to the old behavior with a new numerals argument to specify it differently. See type.convert and read.table

like image 4
G. Grothendieck Avatar answered Oct 20 '22 02:10

G. Grothendieck


Since I don't have rep to comment on Brian Diggs's response - for future reference, the new argument is now called "numerals" (not "exact"). From http://cran.r-project.org/bin/windows/base/NEWS.R-3.1.0patched.html:

type.convert(), read.table() and similar read.*() functions get a new numerals argument, specifying how numeric input is converted when its conversion to double precision loses accuracy. The default numerals = "allow.loss" allows accuracy loss, as in R versions before 3.1.0.

like image 3
tom m Avatar answered Oct 20 '22 02:10

tom m