I have a DataFrame with two columns: a column of int and a column of str.
NaN into the int column, Pandas will convert all the int into float because there is no NaN value for an int.None into the str column, Pandas converts all my int to float as well. This doesn't make sense to me - why does the value I put in column 2 affect column 1?Here's a simple working example):
import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)
The output is:
   int   str
0    0  zero
   int   str
0  0.0  zero
1  1.0   NaN
Is there any way to make the output the following:
   int   str
0    0  zero
   int   str
0    0  zero
1    1   NaN
without recasting the first column to int.
I prefer using int instead of float because the actual data in
that column are integers. If there's not workaround, I'll just
use float though.
I prefer not having to recast because in my actual code, I don't
store the actual dtype.
I also need the data inserted row-by-row.
Python 3 automatically converts integers to floats as needed.
Convert Column to int (Integer)Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.
Python also has a built-in function to convert floats to integers: int() . In this case, 390.8 will be converted to 390 . When converting floats to integers with the int() function, Python cuts off the decimal and remaining numbers of a float to create an integer.
The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.
If you set dtype=object, your series will be able to contain arbitrary data types:
df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)
   int   str
0    0  zero
1  NaN   NaN
  int   str
0   0  zero
1   1  None
                        As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaN/None.
...
df = df.convert_dtypes()
df.loc[1] = [1, None]
print(df)
#   int   str
# 0   0  zero
# 1   1  NaN
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With