I have dataframe with parent child relationships that looks like this:
**child Parent relationship** A1x2 bc11 direct_parent bc11 Aw00 direct_parent bc11 Aw00 ultimate_parent Aee1 Aee0 direct_parent Aee1 Aee0 ultimate_parent
I would like to get all the ancestors for all child nodes in a new dataframe. The result would look something like this:
node ancesstory_tree A1x2 [A1x2,bc11,Aw00] Aee1 [Aee1,Aee0]
Note: The real dataset could have a lot of direct predecessor nodes between child and ultimate parent.
Another approach, using the from_pandas_edgelist
and ancestors
from the networkx
package:
import networkx as nx
# Create the Directed Graph
G = nx.from_pandas_edgelist(df,
source='Parent',
target='child',
create_using=nx.DiGraph())
# Create dict of nodes and ancestors
ancestors = {n: {n} | nx.ancestors(G, n) for n in df['child'].unique()}
# Convert dict back to DataFrame if necessary
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
print(df_ancestors)
[out]
node ancestry_tree
0 A1x2 [A1x2, Aw00, bc11]
1 bc11 [bc11, Aw00]
2 Aee1 [Aee1, Aee0]
To filter out "middle children" from the output table, you can filter to last children only using the out_degree
method - where last children should have an out_degree == 0
last_children = [n for n, d in G.out_degree() if d == 0]
ancestors = {n: {n} | nx.ancestors(G, n) for n in last_children}
df_ancestors = pd.DataFrame([(k, list(v)) for k, v in ancestors.items()],
columns=['node', 'ancestry_tree'])
[out]
node ancestry_tree
0 A1x2 [A1x2, Aw00, bc11]
1 Aee1 [Aee1, Aee0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With