Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select values based on array's index from relative index within pandas groups

Tags:

python

pandas

I have a DataFrame, which is already sorted by the columns ('year', 'month'), that looks like this:

df = pd.DataFrame({
    'year': [2020, 2020, 2020, 2021, 2021, 2021, 2021],
    'month': [1, 2, 5, 2, 4, 7, 9],
    'values': [
        ['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c'],
        ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D'], ['A', 'B', 'C', 'D']
    ]
})

print(df)

    year    month   values
0   2020    1       ['a', 'b', 'c']
1   2020    2       ['a', 'b', 'c']
2   2020    5       ['a', 'b', 'c']
3   2021    2       ['A', 'B', 'C', 'D']
4   2021    4       ['A', 'B', 'C', 'D']
5   2021    7       ['A', 'B', 'C', 'D']
6   2021    9       ['A', 'B', 'C', 'D']

I want to create a new column named 'value', which contains the value of the i-th element on the 'values' array, where i is the index of the corresponding month, grouped by year. In this case, the outcome would be:

    year    month   values                  value
0   2020    1       ['a', 'b', 'c']         'a'
1   2020    2       ['a', 'b', 'c']         'b'
2   2020    5       ['a', 'b', 'c']         'c'
3   2021    2       ['A', 'B', 'C', 'D']    'A'
4   2021    4       ['A', 'B', 'C', 'D']    'B'
5   2021    7       ['A', 'B', 'C', 'D']    'C'
6   2021    9       ['A', 'B', 'C', 'D']    'D'

I'm assuming there is no data missing on the arrays. Some lines I've tried involve using .groupby('year') followed by .get_loc('month'), but couldn't get the correct result so far.

EDIT:

There is one detail I forgot to mention: The months are not necessarily in a uniform range, therefore the index is not always month-1. I've edited the DataFrame in question to reflect this nuance.

like image 678
L. B. Avatar asked Jan 24 '23 10:01

L. B.


1 Answers

comprehension

df.assign(value=[v[m-1] for v, m in zip(df['values'], df['month'])])

   year  month        values value
0  2020      1     [a, b, c]     a
1  2020      2     [a, b, c]     b
2  2020      3     [a, b, c]     c
3  2021      1  [A, B, C, D]     A
4  2021      2  [A, B, C, D]     B
5  2021      3  [A, B, C, D]     C
6  2021      4  [A, B, C, D]     D

Alternative map with list.__getitem__

df.assign(value=[*map(list.__getitem__, df['values'], df['month'] - 1)])
like image 163
piRSquared Avatar answered Feb 03 '23 06:02

piRSquared