I wonder why pandas treats the two lambdas l3 and l4 differently - both take one argument, both return a string, and both should never be executed because df is actually empty:
import pandas as pd
df = pd.DataFrame(data={"col1": [], "col2": []})
l3 = lambda r: ""
l4 = lambda r: f"{r.col1}"
df["col3"] = df.apply(l3, axis=1)
df["col4"] = df.apply(l4, axis=1) # Error: Wrong number of items passed 3, placement implies 1
print(type(df.apply(l3, axis=1))) # this is a Series
print(type(df.apply(l4, axis=1))) # this is a DataFrame
Yet, the return types of df.apply are different.
Bonus question: is there a better way of doing
df["col4"] = df.apply(l4, axis=1)
that works for empty data frames?
Update: I believe a relevant part of the pandas code is this:
https://github.com/pandas-dev/pandas/blob/8e07787bc1030e5d13d3ad5e83b5d060a519ef67/pandas/core/apply.py#L718-L753
In line with what @mozway answered, the function is applied to an empty series, and based on whether this works, returns either the generated new series or a copy of the input (which is a data frame).
In line with what @Brandt commented, one should probably make sure the function works for empty rows, too (which is a weird, at least undocumented requirement).
You shoud add the result_type='reduce' argument to avoid expansion to DataFrame:
df = pd.DataFrame(data={"col1": [], "col2": []})
l3 = lambda r: ""
l4 = lambda r: f"{r.col1}"
df["col3"] = df.apply(l3, axis=1)
df["col4"] = df.apply(l4, axis=1, result_type='reduce')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With