I have a DataFrame, which is already sorted by the columns ('year', 'month')
, that looks like this:
df = pd.DataFrame({
'year': [2020, 2020, 2020, 2021, 2021, 2021, 2021],
'month': [1, 2, 5, 2, 4, 7, 9],
'values': [
['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'],
['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D']
]
})
print(df)
year month values
0 2020 1 ['a', 'b', 'c']
1 2020 2 ['a', 'b', 'c']
2 2020 5 ['a', 'b', 'c']
3 2021 2 ['A', 'B', 'C', 'D']
4 2021 4 ['A', 'B', 'C', 'D']
5 2021 7 ['A', 'B', 'C', 'D']
6 2021 9 ['A', 'B', 'C', 'D']
I want to create a new column named 'value'
, which contains the value of the i-th element on the 'values'
array, where i is the index of the corresponding month, grouped by year. In this case, the outcome would be:
year month values value
0 2020 1 ['a', 'b', 'c'] 'a'
1 2020 2 ['a', 'b', 'c'] 'b'
2 2020 5 ['a', 'b', 'c'] 'c'
3 2021 2 ['A', 'B', 'C', 'D'] 'A'
4 2021 4 ['A', 'B', 'C', 'D'] 'B'
5 2021 7 ['A', 'B', 'C', 'D'] 'C'
6 2021 9 ['A', 'B', 'C', 'D'] 'D'
I'm assuming there is no data missing on the arrays. Some lines I've tried involve using .groupby('year')
followed by .get_loc('month')
, but couldn't get the correct result so far.
EDIT:
There is one detail I forgot to mention: The months are not necessarily in a uniform range, therefore the index is not always month-1
. I've edited the DataFrame in question to reflect this nuance.
df.assign(value=[v[m-1] for v, m in zip(df['values'], df['month'])])
year month values value
0 2020 1 [a, b, c] a
1 2020 2 [a, b, c] b
2 2020 3 [a, b, c] c
3 2021 1 [A, B, C, D] A
4 2021 2 [A, B, C, D] B
5 2021 3 [A, B, C, D] C
6 2021 4 [A, B, C, D] D
map
with list.__getitem__
df.assign(value=[*map(list.__getitem__, df['values'], df['month'] - 1)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With