I want to make relationship between values by their Name based on below rules: 1- I have a CSV file (with more than 100000 rows) that consists of lots of values, I shared some examples as below: <pre class="prettyprint"><code> Name: A02-father A03-father A04-father A05-father A07-father A08-father A09-father A17-father A18-father A20-father A02-SA-A03-SA A02-SA-A04-SA A03-SA-A02-SA A03-SA-A05-SA A03-SA-A17-SA A04-SA-A02-SA A04-SA-A09-SA A05-SA-A03-SA A09-SA-A04-SA A09-SA-A20-SA A17-SA-A03-SA A17-SA-A18-SA A18-SA-A17-SA A20-SA-A09-SA A05-NA B02-Father B04-Father B06-Father B02-SA-B04-SA B04-SA-BO2-SA B04-SA-B06-SA B06-SA-B04-SA B06-NA </code></pre> 2- Now I have another CSV file which let me know from which value I should start? in this case the value is A03-father & B02-father & ... which dont have any influence on each other and they all have seperate path to go, so for each path we will start from mentioned start point. father.csv A03-father B02-father .... 3- Based on the naming I want to make the relationships, As A03-Father has been determined as Father I should check for any value which has been started with A03.(All of them are A0's babies.) Also as B02 is father, we will check for any value which starts with B02. (B02-SA-B04-SA) 4- Now If I find out A03-SA-A02-SA , this is A03's baby. I find out A03-SA-A05-SA , this is A03's baby. I find out A03-SA-A17-SA , this is A03's baby. and after that I must check any node which starts with A02 & A05 & A17: As you see A02-Father exists so it is Father and now we will search for any string which starts with A02 and doesn't have A03 which has been detected as Father(It must be ignored) This must be checked till end of values which exist in the CSV file. As you see I should check the path based on name (REGEX) and should go forward till end of path. The expected result: <pre class="prettyprint"><code> Father Baby A03-father A03-SA-A02-SA A03-father A03-SA-A05-SA A03-father A03-SA-A17-SA A02-father A02-SA-A04-SA A05-father A05-NA A17-father A17-SA-A18-SA A04-father A04-SA-A09-SA A02-father A02-SA-A04-SA A09-father A09-SA-A20-SA B02-father B02-SA-B04-SA B04-father B04-SA-B06-SA B06-father B06-NA </code></pre> I have coded it as below with pandas: <pre class="prettyprint"><code>import pandas as pd import numpy as np import re #Read the file which consists of all Values df = pd.read_csv("C:\\total.csv") #Read the file which let me know who is father Fa = pd.read_csv("C:\\Father.csv") #Get the first part of Father which is A0 Fa['sub'] = Fa['Name'].str.extract(r'(\w+\s*)', expand=False) r2 = [] #check in all the csv file and find anything which starts with A0 and is not Father for f in Fa['sub']: baby=(df[df['Name'].str.startswith(f) & ~df['Name'].str.contains('Father')]) baby['sub'] = bay['Name'].str.extract(r'(\w+\s*)', expand=False) r1= pd.merge(Fa, baby, left_on='sub', right_on='sub',suffixes=('_f', '_c')) r2.append(result1) out_df = pd.concat(result2) out_df= out_df.replace(np.nan, '', regex=True) #find A0-N-A2-M and A0-N-A4-M out_df.to_csv('C:\\child1.csv') #check in all the csv file and find anything which starts with the second part of child1 which is A2 and A4 out_df["baby2"] = out_df['Name_baby'].str.extract(r'^(?:[^-]*-){2}\s*([^-]+)', expand=False) baby3= out_df["baby2"] r4 = [] for f in out_df["baby2"]: #I want to exclude A0 which has been detected. l = ['A0'] regstr = '|'.join(l) baby1=(df[df['Name'].str.startswith(f) & ~df['Name'].str.contains(regstr)]) baby1['sub'] = baby1['Name'].str.extract(r'(\w+\s*)', expand=False) r3= pd.merge(baby3, baby1, left_on='baby2', right_on='sub',suffixes=('_f', '_c')) r4.append(r3) out2_df = pd.concat(r4) out2_df.to_csv('C:\\child2.csv') </code></pre> I want to put below code in a loop and go through the file and check it, based on naming process and detect other fathers and babies till it finished. however this code is not customized and doesn't have the exact result as i expected. my question is about how to make the loop? I should go through the path and also consider <code>regstr</code> value for any string. <pre class="prettyprint"><code>#check in all the csv file and find anything which starts with the second part of child1 which is A2 and A4 out_df["baby2"] = out_df['Name_baby'].str.extract(r'^(?:[^-]*-){2}\s*([^-]+)', expand=False) baby3= out_df["baby2"] r4 = [] for f in out_df["baby2"]: #I want to exclude A0 which has been detected. l = ['A0'] regstr = '|'.join(l) baby1=(df[df['Name'].str.startswith(f) & ~df['Name'].str.contains(regstr)]) baby1['sub'] = baby1['Name'].str.extract(r'(\w+\s*)', expand=False) r3= pd.merge(baby3, baby1, left_on='baby2', right_on='sub',suffixes=('_f', '_c')) r4.append(r3) out2_df = pd.concat(r4) out2_df.to_csv('C:\\child2.csv') </code></pre>

Start with <code>import collections</code> (will be needed soon). I assume that you have already read df and Fa DataFrames. The first part of my code is to create children Series (index - parent, value - child): <pre class="prettyprint"><code>isFather = df.Name.str.contains('-father', case=False) dfChildren = df[~isFather] key = []; val = [] for fath in df[isFather].Name: prefix = fath.split('-')[0] for child in dfChildren[dfChildren.Name.str.startswith(prefix)].Name: key.append(prefix) val.append(child) children = pd.Series(val, index=key) </code></pre> Print children to see the result. The second part is to create the actual result, starting from each starting points in Fa: <pre class="prettyprint"><code>nodes = collections.deque() father = []; baby = [] # Containers for source data # Loop for each starting point for startNode in Fa.Name.str.split('-', expand=True)[0]: nodes.append(startNode) while nodes: node = nodes.popleft() # Take node name from the queue # Children of this node myChildren = children[children.index == node] # Process children (ind - father, val - child) for ind, val in myChildren.items(): parts = val.split('-') # Parts of child name # Child "actual" name (if exists) val_2 = parts[2] if len(parts) >= 3 else '' if val_2 not in father: # val_2 not "visited" before # Add father / child name to containers father.append(ind) baby.append(val) if len(val_2) > 0: nodes.append(val_2) # Add to the queue, to be processe later # Drop rows for "node" from "children" (if any exists) if (children.index == node).sum() > 0: children.drop(node, inplace=True) # Convert to a DataFrame result = pd.DataFrame({'Father': father, 'Baby': baby}) result.Father += '-father' # Add "-father" to "bare" names </code></pre> I added -father with lower case "f", but I think this is not much significant detail. The result, for your data sample, is: <pre class="prettyprint"><code> Father Baby 0 A03-father A03-SA-A02-SA 1 A03-father A03-SA-A05-SA 2 A03-father A03-SA-A17-SA 3 A02-father A02-SA-A04-SA 4 A05-father A05-NA 5 A17-father A17-SA-A18-SA 6 A04-father A04-SA-A09-SA 7 A09-father A09-SA-A20-SA 8 B02-father B02-SA-B04-SA 9 B04-father B04-SA-B06-SA 10 B06-father B06-NA </code></pre> And two remarks concerning your data sample: <ul> <li>You wrote B04-SA-B02-SA with capital O (a letter) instead of 0 (zero). I corrected it in my source data.</li> <li>Row <code>A02-father A02-SA-A04-SA</code> in your expected result is doubled. I assume it should occur only once.</li> </ul>

Commented inline <pre class="prettyprint"><code>def find(data, from_pos=0): fathers = {} skip = [] for x in data[from_pos:]: tks = x.split("-") # Is it father ? if tks[1].lower() == "father": fathers[tks[0]] = x else: if tks[0] in fathers and tks[-2] not in skip: print (fathers[tks[0]], x) # Skip this father appearing as child later skip.append(tks[0]) </code></pre> Testcase: <pre class="prettyprint"><code>data = [ 'A0-Father', 'A0-N-A2-M', 'A0-N-A4-M', 'A2-Father', 'A2-M-A0-N', 'A2-N-A8-M', 'A8-father', 'A8-M-A11-N', 'A8-M-A2-N'] find(data, from_pos=0) </code></pre> Output: <pre class="prettyprint"><code>A0-Father A0-N-A2-M A0-Father A0-N-A4-M A2-Father A2-N-A8-M A8-father A8-M-A11-N </code></pre> <h3>Edit 1:</h3> Start with some data for testing <pre class="prettyprint"><code>data = [ 'A02-father', 'A03-father', 'A04-father', 'A05-father', 'A07-father', 'A08-father', 'A09-father', 'A17-father', 'A18-father', 'A20-father', 'A02-SA-A03-SA', 'A02-SA-A04-SA', 'A03-SA-A02-SA', 'A03-SA-A05-SA', 'A03-SA-A17-SA', 'A04-SA-A02-SA', 'A04-SA-A09-SA', 'A05-SA-A03-SA', 'A09-SA-A04-SA', 'A09-SA-A20-SA', 'A17-SA-A03-SA', 'A17-SA-A18-SA', 'A18-SA-A17-SA', 'A20-SA-A09-SA', 'A05-NA', ] father = [ 'A03-father', ] </code></pre> First let us make a data structure so that manipulations will be easy and lookups for relationships will be fast as you have huge data <pre class="prettyprint"><code>def make_data_structure(data): all_fathers, all_relations = {}, {} for x in data: tks = x.split("-") if tks[1].lower() == "father": all_fathers[tks[0]] = x else: if len(tks) == 2: tks.extend(['NA', 'NA']) if tks[0] in all_relations: all_relations[tks[0]][0].append(tks[-2]) all_relations[tks[0]][1].append(x) else: all_relations[tks[0]] =[[tks[-2]], [x]] return all_fathers, all_relations all_fathers, all_relations = make_data_structure(data) all_fathers, all_relations </code></pre> Output: <pre class="prettyprint"><code>{'A02': 'A02-father', 'A03': 'A03-father', 'A04': 'A04-father', 'A05': 'A05-father', 'A07': 'A07-father', 'A08': 'A08-father', 'A09': 'A09-father', 'A17': 'A17-father', 'A18': 'A18-father', 'A20': 'A20-father'}, {'A02': [['A03', 'A04'], ['A02-SA-A03-SA', 'A02-SA-A04-SA']], 'A03': [['A02', 'A05', 'A17'], ['A03-SA-A02-SA', 'A03-SA-A05-SA', 'A03-SA-A17-SA']], 'A04': [['A02', 'A09'], ['A04-SA-A02-SA', 'A04-SA-A09-SA']], 'A05': [['A03', 'NA'], ['A05-SA-A03-SA', 'A05-NA']], 'A09': [['A04', 'A20'], ['A09-SA-A04-SA', 'A09-SA-A20-SA']], 'A17': [['A03', 'A18'], ['A17-SA-A03-SA', 'A17-SA-A18-SA']], 'A18': [['A17'], ['A18-SA-A17-SA']], 'A20': [['A09'], ['A20-SA-A09-SA']]} </code></pre> As you can see <code>all_fathers</code> holds all the parents and most imporantly <code>all_relations</code> hold the father-child relationship which can be indexed using the <code>father</code> for faster lookups. How lets do the actual parsing of the relationships <pre class="prettyprint"><code>def find(all_fathers, all_relations, from_father): fathers = [from_father] skip = [] while True: if len(fathers) == 0: break current_father = fathers[0] fathers = fathers[1:] for i in range(len(all_relations[current_father][0])): if not all_relations[current_father][0][i] in skip: print (all_fathers[current_father], all_relations[current_father][1][i]) if all_relations[current_father][0][i] != 'NA': fathers.append(all_relations[current_father][0][i]) skip.append(current_father) for x in father: find(all_fathers, all_relations, x.split("-")[0]) </code></pre> Output: <pre class="prettyprint"><code>A03-father A03-SA-A02-SA A03-father A03-SA-A05-SA A03-father A03-SA-A17-SA A02-father A02-SA-A04-SA A05-father A05-NA A17-father A17-SA-A18-SA A04-father A04-SA-A09-SA A09-father A09-SA-A20-SA </code></pre> <h3>Edit 2:</h3> New test cases; [You will have to load the values in father.csv to a list called <code>father</code>]. <pre class="prettyprint"><code>data = [ 'A02-father', 'A03-father', 'A04-father', 'A05-father', 'A07-father', 'A08-father', 'A09-father', 'A17-father', 'A18-father', 'A20-father', 'A02-SA-A03-SA', 'A02-SA-A04-SA', 'A03-SA-A02-SA', 'A03-SA-A05-SA', 'A03-SA-A17-SA', 'A04-SA-A02-SA', 'A04-SA-A09-SA', 'A05-SA-A03-SA', 'A09-SA-A04-SA', 'A09-SA-A20-SA', 'A17-SA-A03-SA', 'A17-SA-A18-SA', 'A18-SA-A17-SA', 'A20-SA-A09-SA', 'A05-NA', 'B02-Father', 'B04-Father', 'B06-Father', 'B02-SA-B04-SA', 'B04-SA-B02-SA', 'B04-SA-B06-SA', 'B06-SA-B04-SA', 'B06-NA', ] father = [ 'A03-father', 'B02-father' ] for x in father: find(all_fathers, all_relations, x.split("-")[0]) </code></pre> Output: <pre class="prettyprint"><code>A03-father A03-SA-A02-SA A03-father A03-SA-A05-SA A03-father A03-SA-A17-SA A02-father A02-SA-A04-SA A05-father A05-NA A17-father A17-SA-A18-SA A04-father A04-SA-A09-SA A09-father A09-SA-A20-SA B02-Father B02-SA-B04-SA B04-Father B04-SA-B06-SA B06-Father B06-NA </code></pre>

Finding relationships between values based on their name in Python with Panda

Tags:

python

regex

python-3.x

pandas

numpy

I want to make relationship between values by their Name based on below rules:

1- I have a CSV file (with more than 100000 rows) that consists of lots of values, I shared some examples as below:

    Name:
A02-father
A03-father
A04-father
A05-father
A07-father
A08-father
A09-father
A17-father
A18-father
A20-father
A02-SA-A03-SA
A02-SA-A04-SA
A03-SA-A02-SA
A03-SA-A05-SA
A03-SA-A17-SA
A04-SA-A02-SA
A04-SA-A09-SA
A05-SA-A03-SA
A09-SA-A04-SA
A09-SA-A20-SA
A17-SA-A03-SA
A17-SA-A18-SA
A18-SA-A17-SA
A20-SA-A09-SA
A05-NA
B02-Father
B04-Father
B06-Father
B02-SA-B04-SA
B04-SA-BO2-SA
B04-SA-B06-SA
B06-SA-B04-SA
B06-NA

2- Now I have another CSV file which let me know from which value I should start? in this case the value is A03-father & B02-father & ... which dont have any influence on each other and they all have seperate path to go, so for each path we will start from mentioned start point. father.csv A03-father B02-father ....

3- Based on the naming I want to make the relationships, As A03-Father has been determined as Father I should check for any value which has been started with A03.(All of them are A0's babies.) Also as B02 is father, we will check for any value which starts with B02. (B02-SA-B04-SA)

4- Now If I find out A03-SA-A02-SA , this is A03's baby. I find out A03-SA-A05-SA , this is A03's baby. I find out A03-SA-A17-SA , this is A03's baby.

and after that I must check any node which starts with A02 & A05 & A17: As you see A02-Father exists so it is Father and now we will search for any string which starts with A02 and doesn't have A03 which has been detected as Father(It must be ignored)

This must be checked till end of values which exist in the CSV file. As you see I should check the path based on name (REGEX) and should go forward till end of path.

The expected result:

    Father      Baby
A03-father   A03-SA-A02-SA
A03-father   A03-SA-A05-SA
A03-father   A03-SA-A17-SA
A02-father   A02-SA-A04-SA
A05-father   A05-NA
A17-father   A17-SA-A18-SA
A04-father   A04-SA-A09-SA
A02-father   A02-SA-A04-SA
A09-father   A09-SA-A20-SA
B02-father   B02-SA-B04-SA
B04-father   B04-SA-B06-SA
B06-father   B06-NA

I have coded it as below with pandas:

import pandas as pd
import numpy as np
import re

#Read the file which consists of all Values
df = pd.read_csv("C:\\total.csv")


#Read the file which let me know who is father
Fa = pd.read_csv("C:\\Father.csv")

#Get the first part of Father which is A0
Fa['sub'] = Fa['Name'].str.extract(r'(\w+\s*)', expand=False)
r2 = []

#check in all the csv file and find anything which starts with A0 and is not Father
for f in Fa['sub']:
     baby=(df[df['Name'].str.startswith(f) & ~df['Name'].str.contains('Father')])
     baby['sub'] = bay['Name'].str.extract(r'(\w+\s*)', expand=False)
     r1= pd.merge(Fa, baby, left_on='sub', right_on='sub',suffixes=('_f', '_c'))
     r2.append(result1)
out_df = pd.concat(result2)
out_df= out_df.replace(np.nan, '', regex=True)
#find A0-N-A2-M and A0-N-A4-M
out_df.to_csv('C:\\child1.csv')



#check in all the csv file and find anything which starts with the second part of child1 which is A2 and A4
out_df["baby2"] = out_df['Name_baby'].str.extract(r'^(?:[^-]*-){2}\s*([^-]+)', expand=False)
baby3= out_df["baby2"]
r4 = []
for f in out_df["baby2"]:
    #I want to exclude A0 which has been detected.
     l = ['A0']  
     regstr = '|'.join(l)
     baby1=(df[df['Name'].str.startswith(f) & ~df['Name'].str.contains(regstr)])
     baby1['sub'] = baby1['Name'].str.extract(r'(\w+\s*)', expand=False)

     r3= pd.merge(baby3, baby1, left_on='baby2', right_on='sub',suffixes=('_f', '_c'))
     r4.append(r3)
out2_df = pd.concat(r4)
out2_df.to_csv('C:\\child2.csv')

I want to put below code in a loop and go through the file and check it, based on naming process and detect other fathers and babies till it finished. however this code is not customized and doesn't have the exact result as i expected. my question is about how to make the loop?

I should go through the path and also consider regstr value for any string.

#check in all the csv file and find anything which starts with the second part of child1 which is A2 and A4

out_df["baby2"] = out_df['Name_baby'].str.extract(r'^(?:[^-]*-){2}\s*([^-]+)', expand=False)
baby3= out_df["baby2"]
r4 = []
for f in out_df["baby2"]:
    #I want to exclude A0 which has been detected. 
     l = ['A0']  
     regstr = '|'.join(l)
     baby1=(df[df['Name'].str.startswith(f) & ~df['Name'].str.contains(regstr)])
     baby1['sub'] = baby1['Name'].str.extract(r'(\w+\s*)', expand=False)

     r3= pd.merge(baby3, baby1, left_on='baby2', right_on='sub',suffixes=('_f', '_c'))
     r4.append(r3)
out2_df = pd.concat(r4)
out2_df.to_csv('C:\\child2.csv')

279

asked Nov 28 '20 06:11

Sara Daniel

2 Answers

Start with import collections (will be needed soon).

I assume that you have already read df and Fa DataFrames.

The first part of my code is to create children Series (index - parent, value - child):

isFather = df.Name.str.contains('-father', case=False)
dfChildren = df[~isFather]
key = []; val = []
for fath in df[isFather].Name:
    prefix = fath.split('-')[0]
    for child in dfChildren[dfChildren.Name.str.startswith(prefix)].Name:
        key.append(prefix)
        val.append(child)
children = pd.Series(val, index=key)

Print children to see the result.

The second part is to create the actual result, starting from each starting points in Fa:

nodes = collections.deque()
father = []; baby = []  # Containers for source data
# Loop for each starting point
for startNode in Fa.Name.str.split('-', expand=True)[0]:
    nodes.append(startNode)
    while nodes:
        node = nodes.popleft()  # Take node name from the queue
        # Children of this node
        myChildren = children[children.index == node]
        # Process children (ind - father, val - child)
        for ind, val in myChildren.items():
            parts = val.split('-')  # Parts of child name
            # Child "actual" name (if exists)
            val_2 = parts[2] if len(parts) >= 3 else ''
            if val_2 not in father:  # val_2 not "visited" before
                # Add father / child name to containers
                father.append(ind)
                baby.append(val)
                if len(val_2) > 0:
                    nodes.append(val_2)  # Add to the queue, to be processe later
        # Drop rows for "node" from "children" (if any exists)
        if (children.index == node).sum() > 0:
            children.drop(node, inplace=True)
# Convert to a DataFrame
result = pd.DataFrame({'Father': father, 'Baby': baby})
result.Father += '-father'    # Add "-father" to "bare" names

I added -father with lower case "f", but I think this is not much significant detail.

The result, for your data sample, is:

        Father           Baby
0   A03-father  A03-SA-A02-SA
1   A03-father  A03-SA-A05-SA
2   A03-father  A03-SA-A17-SA
3   A02-father  A02-SA-A04-SA
4   A05-father         A05-NA
5   A17-father  A17-SA-A18-SA
6   A04-father  A04-SA-A09-SA
7   A09-father  A09-SA-A20-SA
8   B02-father  B02-SA-B04-SA
9   B04-father  B04-SA-B06-SA
10  B06-father         B06-NA

And two remarks concerning your data sample:

You wrote B04-SA-B02-SA with capital O (a letter) instead of 0 (zero). I corrected it in my source data.
Row A02-father A02-SA-A04-SA in your expected result is doubled. I assume it should occur only once.

179

answered Oct 23 '22 04:10

Valdi_Bo

Commented inline

def find(data, from_pos=0):
  fathers = {}
  skip = []

  for x in data[from_pos:]:
    tks = x.split("-")

    # Is it father ?
    if tks[1].lower() == "father":
      fathers[tks[0]] = x
    else:    
      if tks[0] in fathers and tks[-2] not in skip:
        print (fathers[tks[0]], x)
        # Skip this father appearing as child later
        skip.append(tks[0])

Testcase:

data = [
'A0-Father', 
'A0-N-A2-M',
'A0-N-A4-M',
'A2-Father',
'A2-M-A0-N',
'A2-N-A8-M',
'A8-father',
'A8-M-A11-N',
'A8-M-A2-N']

find(data, from_pos=0)

Output:

A0-Father A0-N-A2-M
A0-Father A0-N-A4-M
A2-Father A2-N-A8-M
A8-father A8-M-A11-N

Edit 1:

Start with some data for testing

data = [
'A02-father',
'A03-father',
'A04-father',
'A05-father',
'A07-father',
'A08-father',
'A09-father',
'A17-father',
'A18-father',
'A20-father',
'A02-SA-A03-SA',
'A02-SA-A04-SA',
'A03-SA-A02-SA',
'A03-SA-A05-SA',
'A03-SA-A17-SA',
'A04-SA-A02-SA',
'A04-SA-A09-SA',
'A05-SA-A03-SA',
'A09-SA-A04-SA',
'A09-SA-A20-SA',
'A17-SA-A03-SA',
'A17-SA-A18-SA',
'A18-SA-A17-SA',
'A20-SA-A09-SA',
'A05-NA',
]

father = [
  'A03-father',
]

First let us make a data structure so that manipulations will be easy and lookups for relationships will be fast as you have huge data

def make_data_structure(data):
    all_fathers, all_relations = {}, {}
    for x in data:
        tks = x.split("-")

        if tks[1].lower() == "father":
            all_fathers[tks[0]] = x
        else:
            if len(tks) == 2:
                tks.extend(['NA', 'NA'])

            if tks[0] in all_relations:
                all_relations[tks[0]][0].append(tks[-2])
                all_relations[tks[0]][1].append(x)
            else:
                all_relations[tks[0]] =[[tks[-2]], [x]]
            
    return all_fathers, all_relations

all_fathers, all_relations = make_data_structure(data)
all_fathers, all_relations

Output:

{'A02': 'A02-father',
  'A03': 'A03-father',
  'A04': 'A04-father',
  'A05': 'A05-father',
  'A07': 'A07-father',
  'A08': 'A08-father',
  'A09': 'A09-father',
  'A17': 'A17-father',
  'A18': 'A18-father',
  'A20': 'A20-father'},

{'A02': [['A03', 'A04'], ['A02-SA-A03-SA', 'A02-SA-A04-SA']],
 'A03': [['A02', 'A05', 'A17'],
  ['A03-SA-A02-SA', 'A03-SA-A05-SA', 'A03-SA-A17-SA']],
 'A04': [['A02', 'A09'], ['A04-SA-A02-SA', 'A04-SA-A09-SA']],
 'A05': [['A03', 'NA'], ['A05-SA-A03-SA', 'A05-NA']],
 'A09': [['A04', 'A20'], ['A09-SA-A04-SA', 'A09-SA-A20-SA']],
 'A17': [['A03', 'A18'], ['A17-SA-A03-SA', 'A17-SA-A18-SA']],
 'A18': [['A17'], ['A18-SA-A17-SA']],
 'A20': [['A09'], ['A20-SA-A09-SA']]}

As you can see all_fathers holds all the parents and most imporantly all_relations hold the father-child relationship which can be indexed using the father for faster lookups.

How lets do the actual parsing of the relationships

def find(all_fathers, all_relations, from_father):    
    fathers = [from_father]
    skip = []

    while True:
        if len(fathers) == 0:
            break

        current_father = fathers[0]
        fathers = fathers[1:]  

        for i in range(len(all_relations[current_father][0])):
            if not all_relations[current_father][0][i] in skip:
                print (all_fathers[current_father], all_relations[current_father][1][i])            
                if all_relations[current_father][0][i] != 'NA':
                    fathers.append(all_relations[current_father][0][i])            
        skip.append(current_father)    


for x in father:    
    find(all_fathers, all_relations, x.split("-")[0])

Output:

A03-father A03-SA-A02-SA
A03-father A03-SA-A05-SA
A03-father A03-SA-A17-SA
A02-father A02-SA-A04-SA
A05-father A05-NA
A17-father A17-SA-A18-SA
A04-father A04-SA-A09-SA
A09-father A09-SA-A20-SA

Edit 2:

New test cases; [You will have to load the values in father.csv to a list called father].

data = [
'A02-father',
'A03-father',
'A04-father',
'A05-father',
'A07-father',
'A08-father',
'A09-father',
'A17-father',
'A18-father',
'A20-father',
'A02-SA-A03-SA',
'A02-SA-A04-SA',
'A03-SA-A02-SA',
'A03-SA-A05-SA',
'A03-SA-A17-SA',
'A04-SA-A02-SA',
'A04-SA-A09-SA',
'A05-SA-A03-SA',
'A09-SA-A04-SA',
'A09-SA-A20-SA',
'A17-SA-A03-SA',
'A17-SA-A18-SA',
'A18-SA-A17-SA',
'A20-SA-A09-SA',
'A05-NA',
'B02-Father',
'B04-Father',
'B06-Father',
'B02-SA-B04-SA',
'B04-SA-B02-SA',
'B04-SA-B06-SA',
'B06-SA-B04-SA',
'B06-NA',
]

father = [
  'A03-father',
   'B02-father'
]

for x in father:
    find(all_fathers, all_relations, x.split("-")[0])

Output:

A03-father A03-SA-A02-SA
A03-father A03-SA-A05-SA
A03-father A03-SA-A17-SA
A02-father A02-SA-A04-SA
A05-father A05-NA
A17-father A17-SA-A18-SA
A04-father A04-SA-A09-SA
A09-father A09-SA-A20-SA
B02-Father B02-SA-B04-SA
B04-Father B04-SA-B06-SA
B06-Father B06-NA

answered Oct 23 '22 05:10

mujjiga

Related questions
                            
                                How to save and edit server rendering data?
                            
                                Installing socketio module on python3 seems to be corrupting pip
                            
                                Identify the first and all non-zero values in every row in Pandas DataFrame
                            
                                How to convert a sklearn pipeline into a pyspark pipeline?
                            
                                Delete diagonals of zero elements
                            
                                Is there a way to use Python 3.9 type hinting in its previous versions?
                            
                                Kivy sounds do not play on android device even though they play fine on laptop
                            
                                enabling CORS Google Cloud Function (Python)
                            
                                How to read, format, sort, and save a csv file, without pandas
                            
                                what is the difference between using softmax as a sequential layer in tf.keras and softmax as an activation function for a dense layer?
                            
                                Updating dataframe value based on list
                            
                                How to update a pandas dataframe, from multiple API calls
                            
                                How to I compute matching features between high resolution images?
                            
                                How to register typing.Callable with Python @singledispatch?
                            
                                tf.newaxis operation in TensorFlow
                            
                                'tuple' object has no attribute '_committed' error while updating image objects?
                            
                                Faster for-loops with arrays in Python
                            
                                How to check if a column contains list
                            
                                How can I count the number of cases in recursive functions?
                            
                                Is there any way to use SageMath on macOS Big Sur?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding relationships between values based on their name in Python with Panda

Tags:

python

regex

python-3.x

pandas

numpy

Sara Daniel

People also ask

2 Answers

Valdi_Bo

Edit 1:

Edit 2:

mujjiga

Recent Activity

Donate For Us