Based on this solution, I created several masks on a Pandas dataframe to create a new column which should be filled from different columns (based on the conditions).
The (simplified) code looks as follows:
mask0 = (df['condition'] == 1)
mask1 = (df['condition'] == 0)
df.loc[mask0, 'newColumn'] = df['otherColumn1']
df.loc[mask1, 'newColumn'] = df['otherColumn2']
However upon execution of the third line I get the following error:
ValueError: cannot reindex from a duplicate axis
If I simply do this, it works:
df.loc[mask0, 'newColumn'] = 1
What am I doing wrong?
You have to filter in both sides:
mask0 = (df['condition'] == 1)
mask1 = (df['condition'] == 0)
df.loc[mask0, 'newColumn'] = df.loc[mask0, 'otherColumn1']
df.loc[mask1, 'newColumn'] = df.loc[mask1, 'otherColumn2']
But here is better use numpy.select
for avoid repeating code:
df['newColumn'] = np.select([mask0, mask1],
[df['otherColumn1'], df['otherColumn2']],
default=np.nan)
You need to mask the "data provider" as well:
df.loc[mask0, 'newColumn'] = df[mask0, 'otherColumn1']
df.loc[mask1, 'newColumn'] = df[mask1, 'otherColumn2']
If the first condition is true if the latter is false and vice versa, we can use np.where(..)
:
df['newColumn'] = np.where(mask0, df['otherColumn0'], df['otherColumn2'])
or you can use np.select(..)
in case the two can be both false where we fallback on the old values in case both conditons are False
:
df['newColumn'] = np.select(
[mask0, mask1],
[df['otherColumn1'], df['otherColumn2']],
default=df['newColumn']
)
Here of course we make the assumption that newColumn
already exists in the dataframe (for example through some previous processing).
Another solution with np.where
:
df['newColumn'] = np.where(df['condition'].eq(1), df['otherColumn1'], df['condition'])
df['newColumn'] = np.where(df['condition'].eq(0), df['otherColumn2'], df['condition'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With