Why does pandas treat these two strings differently in `apply`?

Question

I wonder why pandas treats the two lambdas l3 and l4 differently - both take one argument, both return a string, and both should never be executed because df is actually empty:

import pandas as pd

df = pd.DataFrame(data={"col1": [], "col2": []})

l3 = lambda r: ""
l4 = lambda r: f"{r.col1}"

df["col3"] = df.apply(l3, axis=1)
df["col4"] = df.apply(l4, axis=1)  # Error: Wrong number of items passed 3, placement implies 1

print(type(df.apply(l3, axis=1)))  # this is a Series
print(type(df.apply(l4, axis=1)))  # this is a DataFrame

Yet, the return types of df.apply are different.

Bonus question: is there a better way of doing

df["col4"] = df.apply(l4, axis=1)

that works for empty data frames?

Update: I believe a relevant part of the pandas code is this:

https://github.com/pandas-dev/pandas/blob/8e07787bc1030e5d13d3ad5e83b5d060a519ef67/pandas/core/apply.py#L718-L753

In line with what @mozway answered, the function is applied to an empty series, and based on whether this works, returns either the generated new series or a copy of the input (which is a data frame).

In line with what @Brandt commented, one should probably make sure the function works for empty rows, too (which is a weird, at least undocumented requirement).

mozway · Accepted Answer

You shoud add the result_type='reduce' argument to avoid expansion to DataFrame:

df = pd.DataFrame(data={"col1": [], "col2": []})

l3 = lambda r: ""
l4 = lambda r: f"{r.col1}"

df["col3"] = df.apply(l3, axis=1)
df["col4"] = df.apply(l4, axis=1, result_type='reduce')

Why does pandas treat these two strings differently in `apply`?

Tags:

python

string

pandas

bers

1 Answers

mozway

Recent Activity

Donate For Us

Why does pandas treat these two strings differently in `apply`?

Tags:

python

string

pandas

bers

1 Answers

mozway

Related questions

Recent Activity

Donate For Us