Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stop Pandas from converting int to float

I have a DataFrame with two columns: a column of int and a column of str.

  • I understand that if I insert NaN into the int column, Pandas will convert all the int into float because there is no NaN value for an int.
  • However, when I insert None into the str column, Pandas converts all my int to float as well. This doesn't make sense to me - why does the value I put in column 2 affect column 1?

Here's a simple working example):

import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)

df.loc[0] = [0, "zero"]
print(df)
print()

df.loc[1] = [1, None]
print(df)

The output is:

   int   str
0    0  zero

   int   str
0  0.0  zero
1  1.0   NaN

Is there any way to make the output the following:

   int   str
0    0  zero

   int   str
0    0  zero
1    1   NaN

without recasting the first column to int.

  • I prefer using int instead of float because the actual data in that column are integers. If there's not workaround, I'll just use float though.

  • I prefer not having to recast because in my actual code, I don't
    store the actual dtype.

  • I also need the data inserted row-by-row.

like image 439
user2570465 Avatar asked Oct 26 '16 00:10

user2570465


People also ask

Does Python automatically convert int to float?

Python 3 automatically converts integers to floats as needed.

How do I change to int in pandas?

Convert Column to int (Integer)Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.

How do you convert a float to an int in Python?

Python also has a built-in function to convert floats to integers: int() . In this case, 390.8 will be converted to 390 . When converting floats to integers with the int() function, Python cuts off the decimal and remaining numbers of a float to create an integer.

How do I get rid of pandas indexing?

The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.


2 Answers

If you set dtype=object, your series will be able to contain arbitrary data types:

df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)

   int   str
0    0  zero
1  NaN   NaN

  int   str
0   0  zero
1   1  None
like image 101
maxymoo Avatar answered Oct 12 '22 06:10

maxymoo


As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaN/None.

...

df = df.convert_dtypes()
df.loc[1] = [1, None]
print(df)

#   int   str
# 0   0  zero
# 1   1  NaN
like image 14
totalhack Avatar answered Oct 12 '22 07:10

totalhack