Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling HUGE numbers in numpy or pandas

I am doing a competition where I am provided data that is anonymized. Quite a few of the columns have HUGE values. The largest was 40 digits long! I used pd.read_csv but those columns have been converted to objects as a result.

My original plan was to scale the data down but since they are seen as objects I can't do arithmetic on these.

Does anyone have a suggestion on how to handle huge numbers in Pandas or Numpy?

Note that I've tried converting the value to a uint64 with no luck. I get the error "long too big to convert"

like image 654
Terence Chow Avatar asked Dec 09 '22 09:12

Terence Chow


2 Answers

If you have a mixed-type column -- some integers, some strings -- stored in a dtype=object column, you can still convert to ints and perform arithmetic. Starting from a mixed-type column:

>>> df = pd.DataFrame({"A": [11**44, "11"*22]})
>>> df
                                                A
0  6626407607736641103900260617069258125403649041
1    11111111111111111111111111111111111111111111

[2 rows x 1 columns]
>>> df.dtypes, list(map(type, df.A))
(A    object
dtype: object, [<type 'long'>, <type 'str'>])

We can convert to ints:

>>> df["A"] = df["A"].apply(int)
>>> df.dtypes, list(map(type, df.A))
(A    object
dtype: object, [<type 'long'>, <type 'long'>])
>>> df
                                                A
0  6626407607736641103900260617069258125403649041
1    11111111111111111111111111111111111111111111

[2 rows x 1 columns]

And then perform arithmetic:

>>> df // 11
                                               A
0  602400691612421918536387328824478011400331731
1    1010101010101010101010101010101010101010101

[2 rows x 1 columns]
like image 193
DSM Avatar answered Dec 11 '22 10:12

DSM


You can use Pandas converters to call int or some other custom converter function on the string as they are being imported:

import pandas as pd 
from StringIO import StringIO

txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,1,"Tiny"
4,-9999999999999999999999999999999999999999,"Really negative"
'''

df=pd.read_csv(StringIO(txt), converters={'Big_Num':int})

print df

Prints:

   line                                    Big_Num                           text
0     1   1234567890123456789012345678901234567890      That sure is a big number
1     2   9999999999999999999999999999999999999999  That is an even BIGGER number
2     3                                          1                           Tiny
3     4  -9999999999999999999999999999999999999999                Really negative

Now test arithmetic:

n=df["Big_Num"][1]
print n,n+1 

Prints:

9999999999999999999999999999999999999999 10000000000000000000000000000000000000000

If you have any values in the column that might cause int to croak, you can do this:

txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,0.000000000000000001,"Tiny"
4,"a string","Use 0 for strings"
'''

def conv(s):
    try:
        return int(s)
    except ValueError:
        try:
            return float(s)
        except ValueError:
            return 0        

df=pd.read_csv(StringIO(txt), converters={'Big_Num':conv})
print df

Prints:

   line                                   Big_Num                           text
0     1  1234567890123456789012345678901234567890      That sure is a big number
1     2  9999999999999999999999999999999999999999  That is an even BIGGER number
2     3                                     1e-18                           Tiny
3     4                                         0              Use 0 for strings

Then every value in the column will be either a Python int or a float and will support arithmetic.

like image 29
dawg Avatar answered Dec 11 '22 08:12

dawg