I have a dataset that looks like this:
df = pd.DataFrame({'Country': ['PL', 'PL', 'PL', 'PL', 'UK', 'UK', 'US', 'US', 'US'],
'Val1': ['y1', 'b', 'c', 'd', 'y2', 'b', 'y3', 'b', 'c'],
'Val2': ['x1', 'b', 'c', 'd', 'x2', 'b', 'x3', 'b', 'c']})
Out[34]:
Country Val1 Val2
0 PL y1 x1
1 PL b b
2 PL c c
3 PL d d
4 UK y2 x2
5 UK b b
6 US y3 x3
7 US b b
8 US c c
What I want to do is update Val2 of the first row of each Country with a Val1 from the same row. So, I would like x1 to become y1, x2 to become y2, x3 to be come y3 and so on.
What I have tried is the following:
countries = df['Country'].unique()
for c in countries:
df.loc[df['Country'] == c, 'Val2'].iloc[0] = df.loc[df['Country'] == c, 'Val1'].iloc[0]
This loop works but it does not update my Dataframe. So I think my problem here is understanding how Dataframes can be updated for specific rows/columns/values.
What would be the proper way to to this?
PS. It would be nice if someone could explain why my solution does not work.
using .drop_duplicates
and .loc
df.loc[df.drop_duplicates(subset=['Country'],keep='first').index,'Val2'] = df['Val1']
print(df)
Country Val1 Val2
0 PL y1 y1
1 PL b b
2 PL c c
3 UK y2 y2
4 UK b b
5 UK c c
6 US y3 y3
7 US b b
8 US c c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With