I am trying to make a new column called 'wage_rate' that fills in the appropriate wage rate for the employee based on the year of the observation.
In other words, my list looks something like this:
eecode year w2011 w2012 w2013
1 2012 7 8 9
1 2013 7 8 9
2 2011 20 25 25
2 2012 20 25 25
2 2013 20 25 25
And I want return, in a new column, 8 for the first row, 9 for the second, 20, 25, 25.
One way would be to use apply by constructing column name for each row based on year like 'w' + str(x.year).
In [41]: df.apply(lambda x: x['w' + str(x.year)], axis=1)
Out[41]:
0 8
1 9
2 20
3 25
4 25
dtype: int64
Details:
In [42]: df
Out[42]:
eecode year w2011 w2012 w2013
0 1 2012 7 8 9
1 1 2013 7 8 9
2 2 2011 20 25 25
3 2 2012 20 25 25
4 2 2013 20 25 25
In [43]: df['wage_rate'] = df.apply(lambda x: x['w' + str(x.year)], axis=1)
In [44]: df
Out[44]:
eecode year w2011 w2012 w2013 wage_rate
0 1 2012 7 8 9 8
1 1 2013 7 8 9 9
2 2 2011 20 25 25 20
3 2 2012 20 25 25 25
4 2 2013 20 25 25 25
values = [ row['w%s'% row['year']] for key, row in df.iterrows() ]
df['wage_rate'] = values # create the new columns
This solution is using an explicit loop, thus is likely slower than other pure-pandas solutions, but on the other hand it is simple and readable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With