I have a pandas.DataFrame contains nodes from trees. The table looks like this:
╔═══════╦════════╦════════╦══════╗
║ index ║ color ║ name ║ head ║
╠═══════╬════════╬════════╬══════╣
║ 0 ║ red ║ Tom ║ 0 ║
║ 1 ║ blue ║ Lucy ║ 0 ║
║ 2 ║ green ║ Peter ║ 1 ║
║ 3 ║ red ║ Katy ║ 1 ║
║ 4 ║ green ║ Sam ║ 4 ║
║ 5 ║ orange ║ Linda ║ 2 ║
║ 6 ║ blue ║ Robert ║ 4 ║
║ 7 ║ brown ║ James ║ 6 ║
║ 8 ║ red ║ Betty ║ 7 ║
║ 9 ║ red ║ Amanda ║ 4 ║
║ 10 ║ black ║ Luke ║ 8 ║
╚═══════╩════════╩════════╩══════╝
The column head stores the index of parent node. It will create a tree as below:

And each node can have 0+ children (not limited to 2).
I want to find another person with the same color when I select a person. There are 3 rules:
None if no one can be selectedFor example, Katy will match with Tom. Since there are no more red color in the same stem with Betty, Amanda will be selected.
Is there any way rather than brute force all combination to get the answer?
I used the technique of network analysis, not sure if it best suits your case.
The idea is simple:
Here is my code
import io
import pandas as pd
import networkx as nx
from networkx.algorithms import shortest_path, has_path
# Data
df_str = """
index,colour,name,head
0,red,Tom,0
1,blue,Lucy,0
2,green,Peter,1
3,red,Katy,1
4,green,Sam,4
5,orange,Linda,2
6,blue,Robert,4
7,brown,James,6
8,red,Betty,7
9,red,Amanda,4
10,black,Luke,8
"""
df = pd.read_csv(io.StringIO(df_str), sep=",")
# Function to find the closest person with the same colour as the person with `id0`
def find_same_colour(id0, df):
# Create network
g = nx.Graph()
for _, row in df.iterrows():
g.add_node(row['index'], colour=row['colour'])
if row['index'] != row['head']:
g.add_edge(row['index'], row['head'])
# List out candidates
colour = df.loc[df['index'].values == id0, 'colour'].values[0]
candidates = df.loc[(df['colour'].values == colour) & (df['index'].values != id0), 'index'].values
# Initialise searching
target = None
l_max = df.shape[0] + 2
# Search
for i in candidates:
if has_path(g, id0, i):
path = shortest_path(g, id0, i)
if len(path) < l_max:
target = i
return target
for i in df['index'].values:
print(i, find_same_colour(i, df), sep='-')
And here is the output,
# 0-3
# 1-None
# 2-None
# 3-0
# 4-None
# 5-None
# 6-None
# 7-None
# 8-9
# 9-8
# 10-None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With