Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird error in R when importing (64-bit) integer with many digits

Tags:

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)

a<-read.csv('temp.csv',as.is=T)

When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...

1 "4031320121153001444" 4031320121153001472
2 "4113020071082679601" 4113020071082679808
3 "4073020091116779570" 4073020091116779520
4 "2081720101128577687" 2081720101128577792
5 "4041720081087539887" 4041720081087539712
6 "4011120071074301496" 4011120071074301440
7 "4021520051054304372" 4021520051054304256
8 "4082520061068996911" 4082520061068997120
9 "4082620101129165548" 4082620101129165312

like image 266
Zubin Avatar asked Jul 11 '12 20:07

Zubin


1 Answers

As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

UPDATE: Here's how you can get your file into an int64 object:

# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)
like image 162
Joshua Ulrich Avatar answered Nov 15 '22 07:11

Joshua Ulrich