I'm having an issue with Pandas v2.1.0+
that I can't figure out.
I have a list of columns in my pandas data frame that I need to convert using a custom function. The new values depend on multiple columns in the data, so I'm using apply to convert the column in-place:
my_columns_to_convert = ['col1','col2','col3']
for k in my_columns_to_convert:
df[k] = df[[k,colx]].apply(lambda x: convert_my_data(value_1_in=x[0],value_2_in=x[1]),axis=1)
This has worked just fine in previous versions of pandas. But now I get:
FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
But I'm not using loc or iloc, and everything I've reviewed thus far seems to point at that being the issue. How can i write this code so that I'm doing it the 'correct' way in the future?
Using previous methods in Pandas that did work.
This FutureWarning
can be triggered in 2.1.0
with this simple example :
ser = pd.Series({"A": "a", "B": "b", "C": "c"})
# A a
# B b
# C c
# dtype: object
print(ser[1]) # gives 'b' but with a FutureWarning: Series.__getitem__ treating keys..
The goal is to have a consistent behaviour when [ ]-indexing a DataFrame as well as a Series. Remember that df[1]
does not return the column located at the second position of that DataFrame and will trigger a KeyError
(unless the literal 0 is an actual column and in this case, the column 0
will be returned).
So based on your code, your df
(see how I imagine it below) most likely hasn't a default index (i.e a range of
integers or at least a list of integers). So when slicing each Series here x[0]
, x[1]
while the indices are strings ["A", "B", "C"]
, you're warned by pandas to use x.iloc[0]
and x.iloc[1]
instead.
my_columns_to_convert = ['col1', 'col2', 'col3']
df = pd.DataFrame(
np.arange(12).reshape(-1, 4),
index=list("ABC"), columns= my_columns_to_convert + ["colx"]
)
# col1 col2 col3 colx
# A 0 3 6 3
# B 28 35 42 7
# C 88 99 110 11
def convert_my_data(value_1_in, value_2_in):
return value_1_in * value_2_in # a simple calculation
for k in my_columns_to_convert:
df[k] = (
df[[k, "colx"]].apply(
lambda x: convert_my_data(value_1_in=x[0], value_2_in=x[1]), axis=1)
)
# the FutureWarning is displayed three times (= the length of the Series) :
FutureWarning:
Series.__getitem__
treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, useser.iloc[pos]
:lambda x: convert_my_data(value_1_in=x[0], value_2_in=x[1]), axis=1)
As a side note, your code seems to be not efficient and can potentially be easily vectorized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With