Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas convert unsigned int greater than 2**63-1 to objects?

When I convert a numpy array to a pandas data frame pandas changes uint64 types to object types if the integer is greater than 2^63 - 1.

import pandas as pd
import numpy as np

x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))
y = np.array([('foo', 2 ** 63 - 1)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))

print pd.DataFrame(x).dtypes.unsigned
dtype('O')
print pd.DataFrame(y).dtypes.unsigned
dtype('uint64')

This is annoying as I can't write the data frame to a hdf file in the table format:

pd.DataFrame(x).to_hdf('x.hdf', 'key', format = 'table')

Ouput:

TypeError: Cannot serialize the column [unsigned] because its data contents are [integer] object dtype

Can someone explain the type conversion?

like image 233
jamin Avatar asked Dec 15 '15 07:12

jamin


People also ask

What does Astype do in pandas?

Pandas DataFrame astype() Method The astype() method returns a new DataFrame where the data types has been changed to the specified type.

What does .at do in pandas?

The at method in pandas is used to get a single value from a dataframe based on the row index and column name. The at method needs two parameters: the row index and column name. It then returns the value of the specified position.

How do pandas handle infinity?

For including infinity in the data, import NumPy module, and use np. inf for positive infinity and -np. inf for negative infinity.

How do you convert objects to numeric pandas?

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric(). This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.


1 Answers

It's an open bug, but you can force it back to an uint64 using DataFrame.astype()

x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))

a = pd.DataFrame(x)
a['unsigned'] = a['unsigned'].astype(np.uint64)
>>>a.dtypes
string      object
unsigned    uint64
dtype: object

Other methods used to convert data types to numeric values raised errors or did not work:

>>>pd.to_numeric(a['unsigned'], errors = coerce)
OverflowError: Python int too large to convert to C long

>>>a.convert_objects(convert_numeric = True).dtypes
string      object
unsigned    object
dtype: object
like image 164
ilyas patanam Avatar answered Sep 16 '22 23:09

ilyas patanam