Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate min and max value of a transition with index of first occurrence in pandas

I have a DataFrame:

df = pd.DataFrame({'ID':['a','b','d','d','a','b','c','b','d','a','b','a'], 
                   'sec':[3,6,2,0,4,7,10,19,40,3,1,2]})
print(df)
   ID  sec
0   a    3
1   b    6
2   d    2
3   d    0
4   a    4
5   b    7
6   c   10
7   b   19
8   d   40
9   a    3
10  b    1
11  a    2

I want to calculate how many times a transition has occurred. Here in the ID column a->b is considered as a transition, similarly for b->d, d->d, d->a, b->c, c->b, b->a. I can do this using Counter like:

Counter(zip(df['ID'].to_list(),df['ID'].to_list()[1:]))
Counter({('a', 'b'): 3,
         ('b', 'd'): 2,
         ('d', 'd'): 1,
         ('d', 'a'): 2,
         ('b', 'c'): 1,
         ('c', 'b'): 1,
         ('b', 'a'): 1})

I also need to get min and max of the sec column of those transitions. Here for example a->b has occurred 3 times out of them min sec value is 1 and max sec value is 7. Also I want to get where this transition first occurred for a->b its 0. For the transition_index column I consider the first value of a transition, i.e. index of a and for calculating, min, max I take the second value of the transition, i.e. value at b.

Here is the final output I want to get:

df = pd.DataFrame({'ID_1':['a','b','d','d','b','c','b'], 
                   'ID_2':['b','d','d','a','c','b','a'],
                   'sec_min':[1,2,0,3,10,19,2],
                   'sec_max':[7,40,0,4,10,19,2],
                   'transition_index':[0,1,2,3,5,6,10],
                   'count':[3,2,1,2,1,1,1]})
print(df)
  ID_1 ID_2  sec_min  sec_max  transition_index  count
0    a    b        1        7                 0      3
1    b    d        2       40                 1      2
2    d    d        0        0                 2      1
3    d    a        3        4                 3      2
4    b    c       10       10                 5      1
5    c    b       19       19                 6      1
6    b    a        2        2                10      1

How can I achieve this in Python?

Also I have a huge amount of data, so I'm looking for the fastest way possible.

like image 730
Space Impact Avatar asked Jul 26 '20 17:07

Space Impact


People also ask

How do you find the max and min of a panda?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.

How do you find the index of the max value of a data frame?

Pandas DataFrame idxmax() Method The idxmax() method returns a Series with the index of the maximum value for each column. By specifying the column axis ( axis='columns' ), the idxmax() method returns a Series with the index of the maximum value for each row.

How do you find the max index of a DataFrame in Python?

idxmax() function returns index of first occurrence of maximum over requested axis. While finding the index of the maximum value across any index, all NA/null values are excluded. Example #1: Use idxmax() function to function to find the index of the maximum value along the index axis.


2 Answers

You have transitions of the form from -> to. 'transition_index' is based on the index of the "from" row, while the 'sec' aggregations are based on the value associated with the "to" row.

We can shift the index and group on the ID and the shifted the ID, allowing us to use a single groupby with named aggregations to get the desired output.


df = df.reset_index()
df['index'] = df['index'].shift().astype('Int64')

(df.groupby([df['ID'].shift(1).rename('ID_1'), df['ID'].rename('ID_2')], sort=False)
   .agg(sec_min=('sec', 'min'),
        sec_max=('sec', 'max'),
        transition_index=('index', 'first'),
        count=('sec', 'size'))
   .reset_index()
)

  ID_1 ID_2  sec_min  sec_max  transition_index  count
0    a    b        1        7                 0      3
1    b    d        2       40                 1      2
2    d    d        0        0                 2      1
3    d    a        3        4                 3      2
4    b    c       10       10                 5      1
5    c    b       19       19                 6      1
6    b    a        2        2                10      1
like image 154
ALollz Avatar answered Oct 22 '22 02:10

ALollz


Start from adding columns with previous values of ID and sec:

df['prevID']  = df.ID.shift(fill_value='')
df['prevSec'] = df.sec.shift(fill_value=0)

Then define the following function:

def find(df, IDfrom, IDto):
    rows = df.query('prevID == @IDfrom and ID == @IDto')
    tbl = rows.loc[:, ['prevSec', 'sec']].values
    n = rows.index.size
    return (n, tbl.min(), tbl.max()) if n > 0 else (n, 0, 0)

Now if you run this function e.g. to find transitions from a to b:

find(df, 'a', 'b')

you will get:

(3, 1, 7)

Then call this function for all other from and to values.

Note that this function returns proper result even if there is no transition between the given values. Of course, you may choose other "surrogate" values for min and max if no transition has been found.

like image 23
Valdi_Bo Avatar answered Oct 22 '22 02:10

Valdi_Bo