Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solving incompatible dtype warning for pandas DataFrame when setting new column iteratively

Setting the value of a new dataframe column:

df.loc[df["Measure"] == metric.label, "source_data_url"] = metric.source_data_url

now (as of Pandas version 2.1.0) gives a warning,

FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '       metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
 first.

The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it's the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?

like image 679
Tom Avatar asked Dec 31 '25 13:12

Tom


2 Answers

I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url, the column does not yet exists, so pandas creates a column source_data_url and assigns value NaN to all of its elements. This makes Pandas think that the column's dtype is float64. Then it raises this warning.

My solution was to create the column with some default value, e.g. empty string, before adding values to it:

df["source_data_url"] = ""

or None seems also to work:

df["source_data_url"] = None

like image 64
lutrarutra Avatar answered Jan 03 '26 02:01

lutrarutra


Since Pandas 2.1.0 setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning.

In a future version, these will raise an error and you should cast to a common dtype first.

Previous behavior:

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: ser[0] = 'not an int64'

In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object

New behavior:

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: ser[0] = 'not an int64'
FutureWarning:
  Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
  Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.

In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object

To retain the current behaviour, you could cast ser to object dtype first:

In [21]: ser = pd.Series([1, 2, 3])

In [22]: ser = ser.astype('object')

In [23]: ser[0] = 'not an int64'

In [24]: ser
Out[24]: 
0    not an int64
1               2
2               3
dtype: object

Source: https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html#deprecated-silent-upcasting-in-setitem-like-series-operations

like image 20
den-kar Avatar answered Jan 03 '26 01:01

den-kar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!