Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: How to convert long number to string to save precision

I have a problem to convert a long number to a string in R. How to easily convert a number to string to preserve precision? A have a simple example below.

a = -8664354335142704128
toString(a)

[1] "-8664354335142704128"

b = -8664354335142703762
toString(b)

[1] "-8664354335142704128"

a == b

[1] TRUE

I expected toString(a) == toString(b), but I got different values. I suppose toString() converts the number to float or something like that before converting to string.

Thank you for your help.

Edit:

> -8664354335142704128 == -8664354335142703762

[1] TRUE

> along = bit64::as.integer64(-8664354335142704128)
> blong = bit64::as.integer64(-8664354335142703762)
> along == blong

[1] TRUE

> blong

integer64
[1] -8664354335142704128

I also tried:

> as.character(blong)

[1] "-8664354335142704128"

> sprintf("%f", -8664354335142703762)

[1] "-8664354335142704128.000000"

> sprintf("%f", blong)

[1] "-0.000000"

Edit 2:

My question first was, if I can convert a long number to string without loss. Then I realized, in R is impossible to get the real value of a long number passed into a function, because R automatically read the value with the loss.

For example, I have the function:

> my_function <- function(long_number){
+ string_number <- toString(long_number)
+ print(string_number)
+ }

If someone used it and passed a long number, I am not able to get the information, which number was passed exactly.

> my_function(-8664354335142703762)
[1] "-8664354335142704128"

For example, if I read some numbers from a file, it is easy. But it is not my case. I just need to use something that some user passed.

I am not R expert, so I just was curious why in another language it works and in R not. For example in Python:

>>> def my_function(long_number):
...     string_number = str(long_number)
...     print(string_number)
... 
>>> my_function(-8664354335142703762)
-8664354335142703762

Now I know, the problem is how R reads and stores numbers. Every language can do it differently. I have to change the way how to pass numbers to R function, and it solves my problem.

So the correct answer to my question is:

""I suppose toString() converts the number to float", nope, you did it yourself (even if unintentionally)." - Nope, R did it itself, that is the way how R reads numbers.

So I marked r2evans answer as the best answer because this user helped me to find the right solution. Thank you!

like image 461
Maurever Avatar asked Mar 05 '23 12:03

Maurever


1 Answers

Bottom line up front, you must (in this case) read in your large numbers as string before converting to 64-bit integers:

bit64::as.integer64("-8664354335142704128") == bit64::as.integer64("-8664354335142703762")
# [1] FALSE

Some points about what you've tried:

  • "I suppose toString() converts the number to float", nope, you did it yourself (even if unintentionally). In R, when creating a number, 5 is a float and 5L is an integer. Even if you had tried to create it as an integer, it would have complained and lost precision anyway:

    class(5)
    # [1] "numeric"
    class(5L)
    # [1] "integer"
    class(-8664354335142703762)
    # [1] "numeric"
    class(-8664354335142703762L)
    # Warning: non-integer value 8664354335142703762L qualified with L; using numeric value
    # [1] "numeric"
    
  • more appropriately, when you type it in as a number and then try to convert it, R processes the inside of the parentheses first. That is, with

    bit64::as.integer64(-8664354335142704128)
    

    R first has to parse and "understand" everything inside the parentheses before it can be passed to the function. (This is typically a compiler/language-parsing thing, not just an R thing.) In this case, it sees that it appears to be a (large) negative float, so it creates a class numeric (float). Only then does it send this numeric to the function, but by this point the precision has already been lost. Ergo the otherwise-illogical

    bit64::as.integer64(-8664354335142704128) == bit64::as.integer64(-8664354335142703762)
    # [1] TRUE
    

    In this case, it just *happens that the 64-bit version of that number is equal to what you intended.

    bit64::as.integer64(-8664254335142704128)  # ends in 4128
    # integer64
    # [1] -8664254335142704128                 # ends in 4128, yay! (coincidence?)
    

    If you subtract one, it results in the same effective integer64:

    bit64::as.integer64(-8664354335142704127)  # ends in 4127
    # integer64
    # [1] -8664354335142704128                 # ends in 4128 ?
    

    This continues for quite a while, until it finally shifts to the next rounding point

    bit64::as.integer64(-8664254335142703617)
    # integer64
    # [1] -8664254335142704128
    bit64::as.integer64(-8664254335142703616)
    # integer64
    # [1] -8664254335142703104
    

    It is unlikely to be coincidence that the difference is 1024, or 2^10. I haven't fished yet, but I'm guessing there's something meaningful about this with respect to floating point precision in 32-bit land.

  • fortunately, bit64::as.integer64 has several S3 methods, useful for converting different formats/classes to a integer64

    library(bit64)
    methods(as.integer64)
    # [1] as.integer64.character as.integer64.double    as.integer64.factor   
    # [4] as.integer64.integer   as.integer64.integer64 as.integer64.logical  
    # [7] as.integer64.NULL     
    

    So, bit64::as.integer64.character can be useful, since precision is not lost when you type it or read it in as a string:

    bit64::as.integer64("-8664354335142704128")
    # integer64
    # [1] -8664354335142704128
    bit64::as.integer64("-8664354335142704128") == bit64::as.integer64("-8664354335142703762")
    # [1] FALSE
    
  • FYI, your number is already near the 64-bit boundary:

    -.Machine$integer.max
    # [1] -2147483647
    -(2^31-1)
    # [1] -2147483647
    log(8664354335142704128, 2)
    # [1] 62.9098
    -2^63 # the approximate +/- range of 64-bit integers
    # [1] -9.223372e+18
    -8664354335142704128
    # [1] -8.664354e+18
    
like image 112
r2evans Avatar answered Mar 12 '23 08:03

r2evans