I have a dataframe whose columns data types need to be mapped to python native data types.
I want to be able to get a dictionary from numpy and convert each column to it's native type.
for example:
{numpy.object_: object,
numpy.bool_: bool,
numpy.string_: str,
numpy.unicode_: unicode,
numpy.int64: int,
numpy.float64: float,
numpy.complex128: complex}
I tried both astype
and pd.to_numeric
, neither downcasts the column sufficiently.
df['source'] = df['source'].astype(int)
returns int32
, as does pd.to_numeric
Most of the comments question the wisdom for doing this. networkx
reads dataframes
and accepts np datatypes
. However the graph cannot be written using json_dumps
because of this well documented error: TypeError: Object of type 'int64' is not JSON serializable
Thanks
The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
to_numeric() This method is used to convert the data type of the column to the numerical one. As a result, the float64 or int64 will be returned as the new data type of the column based on the values in the column.
Change column type in pandas using DataFrame.apply() to_numeric, pandas. to_datetime, and pandas. to_timedelta as arguments to apply the apply() function to change the data type of one or more columns to numeric, DateTime, and time delta respectively.
Convert All Columns to Strings If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.
"Native Python type" to pandas (or to numpy) is an object. That's the extent of it. Pandas only knows it's a Python object and act accordingly. Other than that, you cannot have columns of type string, unicode, integers etc.
You can have object columns and store whatever you want inside them, though. Pandas will handle most of the conversion for you at this stage.
df = pd.DataFrame({'A': [1, 2],
'B': [1., 2.],
'C': [1 + 2j, 3 + 4j],
'D': [True, False],
'E': ['a', 'b'],
'F': [b'a', b'b']})
df.dtypes
Out[71]:
A int64
B float64
C complex128
D bool
E object
F object
dtype: object
for col in df:
print(type(df.loc[0, col]))
<class 'numpy.int64'>
<class 'numpy.float64'>
<class 'numpy.complex128'>
<class 'numpy.bool_'>
<class 'str'>
<class 'bytes'>
df = df.astype('object')
for col in df:
print(type(df.loc[0, col]))
<class 'int'>
<class 'float'>
<class 'complex'>
<class 'bool'>
<class 'str'>
<class 'bytes'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With