I have a DataFrame
with two columns: a column of int
and a column of str
.
NaN
into the int
column, Pandas will convert all the int
into float
because there is no NaN
value for an int
.None
into the str
column, Pandas converts all my int
to float
as well. This doesn't make sense to me - why does the value I put in column 2 affect column 1?Here's a simple working example):
import pandas as pd
df = pd.DataFrame()
df["int"] = pd.Series([], dtype=int)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)
The output is:
int str
0 0 zero
int str
0 0.0 zero
1 1.0 NaN
Is there any way to make the output the following:
int str
0 0 zero
int str
0 0 zero
1 1 NaN
without recasting the first column to int
.
I prefer using int
instead of float
because the actual data in
that column are integers. If there's not workaround, I'll just
use float
though.
I prefer not having to recast because in my actual code, I don't
store the actual dtype
.
I also need the data inserted row-by-row.
Python 3 automatically converts integers to floats as needed.
Convert Column to int (Integer)Use pandas DataFrame. astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy. int64 , numpy.
Python also has a built-in function to convert floats to integers: int() . In this case, 390.8 will be converted to 390 . When converting floats to integers with the int() function, Python cuts off the decimal and remaining numbers of a float to create an integer.
The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.
If you set dtype=object
, your series will be able to contain arbitrary data types:
df["int"] = pd.Series([], dtype=object)
df["str"] = pd.Series([], dtype=str)
df.loc[0] = [0, "zero"]
print(df)
print()
df.loc[1] = [1, None]
print(df)
int str
0 0 zero
1 NaN NaN
int str
0 0 zero
1 1 None
As of pandas 1.0.0 I believe you have another option, which is to first use convert_dtypes. This converts the dataframe columns to dtypes that support pd.NA, avoiding the issues with NaN/None.
...
df = df.convert_dtypes()
df.loc[1] = [1, None]
print(df)
# int str
# 0 0 zero
# 1 1 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With