I have a dataset that looks like this:
df = pd.DataFrame({'Country': ['PL', 'PL', 'PL', 'PL', 'UK', 'UK', 'US', 'US', 'US'],
              'Val1': ['y1', 'b', 'c', 'd', 'y2', 'b', 'y3', 'b', 'c'],
              'Val2': ['x1', 'b', 'c', 'd', 'x2', 'b', 'x3', 'b', 'c']})
Out[34]: 
  Country Val1 Val2
0      PL   y1   x1
1      PL    b    b
2      PL    c    c
3      PL    d    d
4      UK   y2   x2
5      UK    b    b
6      US   y3   x3
7      US    b    b
8      US    c    c
What I want to do is update Val2 of the first row of each Country with a Val1 from the same row. So, I would like x1 to become y1, x2 to become y2, x3 to be come y3 and so on.
What I have tried is the following:
countries = df['Country'].unique()
for c in countries:
    df.loc[df['Country'] == c, 'Val2'].iloc[0] = df.loc[df['Country'] == c, 'Val1'].iloc[0]
This loop works but it does not update my Dataframe. So I think my problem here is understanding how Dataframes can be updated for specific rows/columns/values.
What would be the proper way to to this?
PS. It would be nice if someone could explain why my solution does not work.
using .drop_duplicates and .loc
df.loc[df.drop_duplicates(subset=['Country'],keep='first').index,'Val2'] = df['Val1']
print(df)
  Country Val1 Val2
0      PL   y1   y1
1      PL    b    b
2      PL    c    c
3      UK   y2   y2
4      UK    b    b
5      UK    c    c
6      US   y3   y3
7      US    b    b
8      US    c    c
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With