Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursively calculating ratios between parents and children in pandas dataframe

I looked around for a solution to this to the best of my ability. The closest I was able to find was this, but it's not really what I'm looking for.

I am trying to model the relationship between a value and its parent's value. Specifically trying to calculate a ratio. I would also like to keep track of the level of lineage, like how many children deep is this item?

For example, I would like to input a pandas df that looks like this:

id  parent_id   score
1   0           50
2   1           40
3   1           30
4   2           20
5   4           10

and get this:

id  parent_id   score   parent_child_ratio  level
1   0           50      NA                  1
2   1           40      1.25                2
3   1           30      1.67                2
4   2           20      2                   3
5   4           10      2                   4

So for every row, we go find the score of its parent and then calculate (parent_score/child_score) and make that the value of a new column. And then some sort of counting solution add on the child level.

This has been stumping me for a while, any help is appreciated!!!

like image 288
Adhi R. Avatar asked Apr 20 '18 05:04

Adhi R.


1 Answers

The first part is just merges:

with_parent = pd.merge(df, df, left_on='parent_id', right_on='id', how='left')
with_parent['child_parent_ratio'] = with_parent.score_y /     with_parent.score_x 
with_parent = with_parent.rename(columns={'id_x': 'id', 'parent_id_x': 'parent_id', 'score_x': 'score'})[['id', 'parent_id', 'score', 'child_parent_ratio']]
>>> with_parent
id  parent_id   score   child_parent_ratio
0   1   0   50  NaN
1   2   1   40  1.250000
2   3   1   30  1.666667
3   4   2   20  2.000000
4   5   4   10  2.000000

For the second part you can run breadth-first search. This creates a forest, and the level is the distance from the roots, as in:

enter image description here

E.g., using networkx:

import networkx as nx

G = nx.DiGraph()
G.add_nodes_from(set(with_parent['id'].unique()).union(set(with_parent.parent_id.unique())))
G.add_edges_from([(int(r[1]['parent_id']), int(r[1]['id'])) for r in with_parent.iterrows()])
with_parent['level'] = with_parent['id'].map(nx.shortest_path_length(G, 0))
>>> with_parent
    id  parent_id   score   child_parent_ratio  level
0   1   0   50  NaN         1
1   2   1   40  1.250000    2
2   3   1   30  1.666667    2
3   4   2   20  2.000000    3
4   5   4   10  2.000000    4
like image 169
Ami Tavory Avatar answered Nov 05 '22 18:11

Ami Tavory