Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why pandas by themseleves convert int values in dataframe to float?

I am new here, ideally i would have commented this on the question from where i learned this usage of idxmax :

I used same approach and below is my code

df = pd.DataFrame(np.arange(16).reshape(4,4),columns=["A","B","C","D"],index=[0,1,2,3])

As soon as i use df[(df>6)] on this df these int values change to float?

        A   B   C   D
0   NaN NaN NaN NaN
1   NaN NaN NaN 7.0
2   8.0 9.0 10.0    11.0
3   12.0    13.0    14.0    15.0

Why does pandas do that? Also, i read somewhere i could use dtype=object on series , but are there some other ways to avoid such thing?

like image 774
Avij Avatar asked Nov 07 '17 05:11

Avij


People also ask

Does Python automatically convert int to float?

Python 3 automatically converts integers to floats as needed.

How do pandas turn into floats?

pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.

Can int be converted to float?

Integer and Float Conversions Integers and floats are data types that deal with numbers. To convert the integer to float, use the float() function in Python. Similarly, if you want to convert a float to an integer, you can use the int() function.

How do I change Dtype from float to int?

The df. astype(int) converts Pandas float to int by negelecting all the floating point digits. df. round(0).


3 Answers

The limitation is mostly with Numpy.

  • Numpy's ndarray can only be of a single type.
  • There does not exist an integer type null value.

So we end up with a dilemma when we do df[df > 6]. What is going to happen is Pandas is going to return a dataframe with values equal to df where df > 6 and null otherwise. But like I said, there isn't an integer null value. So we have a choice to make.

  1. Use None or np.nan for null values while making the entire ndarray of dtype==object
  2. Use np.nan as our null and make the entire array of dtype==float

Pandas chooses to make the arrays into float because keeping the values numeric will keep many of the advantages that come with numeric dtypes and their calculations.


Option 1
Use a fill value and pd.DataFrame.where

df.where(df > 6, -1)

    A   B   C   D
0  -1  -1  -1  -1
1  -1  -1  -1   7
2   8   9  10  11
3  12  13  14  15

Option 2
pd.DataFrame.stack and loc
By converting to a single dimension, we aren't forced to fill missing values in the rectangular grid with nulls.

df.stack().loc[lambda x: x > 6]

1  D     7
2  A     8
   B     9
   C    10
   D    11
3  A    12
   B    13
   C    14
   D    15
dtype: int64
like image 165
piRSquared Avatar answered Sep 19 '22 20:09

piRSquared


If you do want to have the int look like

df.astype(object).mask(df<=6)
Out[114]: 
     A    B    C    D
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN    7
2    8    9   10   11
3   12   13   14   15

You can looking for more information at here, and here

This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.

More information about astype(object)

df.astype(object).mask(df<=6).applymap(type)
Out[115]: 
                 A                B                C                D
0  <class 'float'>  <class 'float'>  <class 'float'>  <class 'float'>
1  <class 'float'>  <class 'float'>  <class 'float'>    <class 'int'>
2    <class 'int'>    <class 'int'>    <class 'int'>    <class 'int'>
3    <class 'int'>    <class 'int'>    <class 'int'>    <class 'int'>
like image 32
BENY Avatar answered Sep 16 '22 20:09

BENY


In previous versions (<0.24.0) pandas indeed converted any int columns to floats, if even a single NaN was present. But not anymore, since Optional Nullable Integer Support is now officially added on pandas 0.24.0

pandas 0.24.x release notes Quote: "Pandas has gained the ability to hold integer dtypes with missing values.

like image 33
mork Avatar answered Sep 19 '22 20:09

mork