Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does pandas treat these two strings differently in `apply`?

I wonder why pandas treats the two lambdas l3 and l4 differently - both take one argument, both return a string, and both should never be executed because df is actually empty:

import pandas as pd

df = pd.DataFrame(data={"col1": [], "col2": []})

l3 = lambda r: ""
l4 = lambda r: f"{r.col1}"

df["col3"] = df.apply(l3, axis=1)
df["col4"] = df.apply(l4, axis=1)  # Error: Wrong number of items passed 3, placement implies 1

print(type(df.apply(l3, axis=1)))  # this is a Series
print(type(df.apply(l4, axis=1)))  # this is a DataFrame

Yet, the return types of df.apply are different.

Bonus question: is there a better way of doing

df["col4"] = df.apply(l4, axis=1)

that works for empty data frames?

Update: I believe a relevant part of the pandas code is this:

https://github.com/pandas-dev/pandas/blob/8e07787bc1030e5d13d3ad5e83b5d060a519ef67/pandas/core/apply.py#L718-L753

In line with what @mozway answered, the function is applied to an empty series, and based on whether this works, returns either the generated new series or a copy of the input (which is a data frame).

In line with what @Brandt commented, one should probably make sure the function works for empty rows, too (which is a weird, at least undocumented requirement).

like image 387
bers Avatar asked Nov 25 '25 07:11

bers


1 Answers

You shoud add the result_type='reduce' argument to avoid expansion to DataFrame:

df = pd.DataFrame(data={"col1": [], "col2": []})

l3 = lambda r: ""
l4 = lambda r: f"{r.col1}"

df["col3"] = df.apply(l3, axis=1)
df["col4"] = df.apply(l4, axis=1, result_type='reduce')
like image 79
mozway Avatar answered Nov 26 '25 20:11

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!