I am new here, ideally i would have commented this on the question from where i learned this usage of idxmax :
I used same approach and below is my code
df = pd.DataFrame(np.arange(16).reshape(4,4),columns=["A","B","C","D"],index=[0,1,2,3])
As soon as i use df[(df>6)]
on this df these int values change to float?
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN 7.0
2 8.0 9.0 10.0 11.0
3 12.0 13.0 14.0 15.0
Why does pandas do that? Also, i read somewhere i could use dtype=object on series , but are there some other ways to avoid such thing?
Python 3 automatically converts integers to floats as needed.
pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
Integer and Float Conversions Integers and floats are data types that deal with numbers. To convert the integer to float, use the float() function in Python. Similarly, if you want to convert a float to an integer, you can use the int() function.
The df. astype(int) converts Pandas float to int by negelecting all the floating point digits. df. round(0).
The limitation is mostly with Numpy.
ndarray
can only be of a single type.So we end up with a dilemma when we do df[df > 6]
. What is going to happen is Pandas is going to return a dataframe with values equal to df
where df > 6
and null otherwise. But like I said, there isn't an integer null value. So we have a choice to make.
None
or np.nan
for null values while making the entire ndarray
of dtype==object
np.nan
as our null and make the entire array of dtype==float
Pandas chooses to make the arrays into float because keeping the values numeric will keep many of the advantages that come with numeric dtypes
and their calculations.
Option 1
Use a fill value and pd.DataFrame.where
df.where(df > 6, -1)
A B C D
0 -1 -1 -1 -1
1 -1 -1 -1 7
2 8 9 10 11
3 12 13 14 15
Option 2pd.DataFrame.stack
and loc
By converting to a single dimension, we aren't forced to fill missing values in the rectangular grid with nulls.
df.stack().loc[lambda x: x > 6]
1 D 7
2 A 8
B 9
C 10
D 11
3 A 12
B 13
C 14
D 15
dtype: int64
If you do want to have the int look like
df.astype(object).mask(df<=6)
Out[114]:
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN 7
2 8 9 10 11
3 12 13 14 15
You can looking for more information at here, and here
This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.
More information about astype(object)
df.astype(object).mask(df<=6).applymap(type)
Out[115]:
A B C D
0 <class 'float'> <class 'float'> <class 'float'> <class 'float'>
1 <class 'float'> <class 'float'> <class 'float'> <class 'int'>
2 <class 'int'> <class 'int'> <class 'int'> <class 'int'>
3 <class 'int'> <class 'int'> <class 'int'> <class 'int'>
In previous versions (<0.24.0) pandas indeed converted any int columns to floats, if even a single NaN was present. But not anymore, since Optional Nullable Integer Support is now officially added on pandas 0.24.0
pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With