I have a pandas dataframe with mixed column names:
1,2,3,4,5, 'Class'
When I save this dataframe to h5file, it says that the performance will be affected due to mixed types. How do I convert the integer to string in pandas?
You can convert the column “Fee” to a string by simply using DataFrame. apply(str) , for example df["Fee"]=df["Fee"]. apply(str) . Yields below output.
to_numeric() The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.
astype() method. We can pass any Python, Numpy or Pandas datatype to change all columns of a dataframe to that type, or we can pass a dictionary having column names as keys and datatype as values to change type of selected columns.
Pandas DataFrame astype() Method The astype() method returns a new DataFrame where the data types has been changed to the specified type.
You can simply use df.columns = df.columns.astype(str)
:
In [26]: df = pd.DataFrame(np.random.random((3,6)), columns=[1,2,3,4,5,'Class']) In [27]: df Out[27]: 1 2 3 4 5 Class 0 0.773423 0.865091 0.614956 0.219458 0.837748 0.862177 1 0.544805 0.535341 0.323215 0.929041 0.042705 0.759294 2 0.215638 0.251063 0.648350 0.353999 0.986773 0.483313 In [28]: df.columns.map(type) Out[28]: array([<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>, <class 'str'>], dtype=object) In [29]: df.to_hdf("out.h5", "d1") C:\Anaconda3\lib\site-packages\pandas\io\pytables.py:260: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed-integer,key->axis0] [items->None] f(store) C:\Anaconda3\lib\site-packages\pandas\io\pytables.py:260: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed-integer,key->block0_items] [items->None] f(store) In [30]: df.columns = df.columns.astype(str) In [31]: df.columns.map(type) Out[31]: array([<class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>, <class 'str'>], dtype=object) In [32]: df.to_hdf("out.h5", "d1") In [33]:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With