Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert pandas dataframe columns to native python data types?

I have a dataframe whose columns data types need to be mapped to python native data types.

I want to be able to get a dictionary from numpy and convert each column to it's native type.

for example:

{numpy.object_: object,
 numpy.bool_: bool,
 numpy.string_: str,
 numpy.unicode_: unicode,
 numpy.int64: int,
 numpy.float64: float,
 numpy.complex128: complex}

I tried both astype and pd.to_numeric, neither downcasts the column sufficiently.

df['source'] = df['source'].astype(int) returns int32, as does pd.to_numeric

Update:

Most of the comments question the wisdom for doing this. networkx reads dataframes and accepts np datatypes. However the graph cannot be written using json_dumps because of this well documented error: TypeError: Object of type 'int64' is not JSON serializable

Thanks

like image 666
Itay Livni Avatar asked Nov 21 '17 22:11

Itay Livni


People also ask

How do you convert a DataFrame column into a data type?

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

How do you convert a column datatype in Python?

to_numeric() This method is used to convert the data type of the column to the numerical one. As a result, the float64 or int64 will be returned as the new data type of the column based on the values in the column.

How do I change the column data type in pandas?

Change column type in pandas using DataFrame.apply() to_numeric, pandas. to_datetime, and pandas. to_timedelta as arguments to apply the apply() function to change the data type of one or more columns to numeric, DateTime, and time delta respectively.

How do I convert all columns to string in pandas Python?

Convert All Columns to Strings If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.


1 Answers

"Native Python type" to pandas (or to numpy) is an object. That's the extent of it. Pandas only knows it's a Python object and act accordingly. Other than that, you cannot have columns of type string, unicode, integers etc.

You can have object columns and store whatever you want inside them, though. Pandas will handle most of the conversion for you at this stage.

df = pd.DataFrame({'A': [1, 2], 
                   'B': [1., 2.], 
                   'C': [1 + 2j, 3 + 4j], 
                   'D': [True, False], 
                   'E': ['a', 'b'], 
                   'F': [b'a', b'b']})

df.dtypes
Out[71]: 
A         int64
B       float64
C    complex128
D          bool
E        object
F        object
dtype: object

for col in df:
    print(type(df.loc[0, col]))

<class 'numpy.int64'>
<class 'numpy.float64'>
<class 'numpy.complex128'>
<class 'numpy.bool_'>
<class 'str'>
<class 'bytes'>

df = df.astype('object')

for col in df:
    print(type(df.loc[0, col]))

<class 'int'>
<class 'float'>
<class 'complex'>
<class 'bool'>
<class 'str'>
<class 'bytes'>
like image 50
ayhan Avatar answered Oct 18 '22 04:10

ayhan