Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all directly intermediate and ultimate parent nodes of a child node in a pandas data frame

I have dataframe with parent child relationships that looks like this:

**child                Parent              relationship**

   A1x2                 bc11                direct_parent
   bc11                 Aw00                direct_parent
   bc11                 Aw00                ultimate_parent
   Aee1                 Aee0                direct_parent
   Aee1                 Aee0                ultimate_parent

I would like to get all the ancestors for all child nodes in a new dataframe. The result would look something like this:

node                   ancesstory_tree

A1x2                    [A1x2,bc11,Aw00]   
Aee1                    [Aee1,Aee0]

Note: The real dataset could have a lot of direct predecessor nodes between child and ultimate parent.

like image 380
Azee. Avatar asked Jan 25 '23 08:01

Azee.


1 Answers

Another approach, using the from_pandas_edgelist and ancestors from the networkx package:

import networkx as nx

# Create the Directed Graph
G = nx.from_pandas_edgelist(df,
                            source='Parent',
                            target='child',
                            create_using=nx.DiGraph())

# Create dict of nodes and ancestors
ancestors = {n: {n} | nx.ancestors(G, n) for n in df['child'].unique()}

# Convert dict back to DataFrame if necessary
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
                            columns=['node', 'ancestry_tree'])

print(df_ancestors)

[out]

   node       ancestry_tree
0  A1x2  [A1x2, Aw00, bc11]
1  bc11        [bc11, Aw00]
2  Aee1        [Aee1, Aee0]

To filter out "middle children" from the output table, you can filter to last children only using the out_degree method - where last children should have an out_degree == 0

last_children = [n for n, d in G.out_degree() if d == 0]

ancestors = {n: {n} | nx.ancestors(G, n) for n in last_children}

df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
                            columns=['node', 'ancestry_tree'])

[out]

   node       ancestry_tree
0  A1x2  [A1x2, Aw00, bc11]
1  Aee1        [Aee1, Aee0]
like image 139
Chris Adams Avatar answered May 29 '23 02:05

Chris Adams