Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Create hierarchy file (find paths from root to leaves in tree represented as table)

Given the following unordered tab delimited file:

Asia    Srilanka
Srilanka    Colombo
Continents  Europe
India   Mumbai
India   Pune
Continents  Asia
Earth   Continents
Asia    India

The goal is to generate the following output (tab delimited):

Earth   Continents  Asia    India   Mumbai
Earth   Continents  Asia    India   Pune
Earth   Continents  Asia    Srilanka    Colombo
Earth   Continents  Europe

I have created the following script to achieve the goal:

root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children 
for line in f:
    line=line.rstrip('\r\n')
    line=line.strip()
    cols=list(line.split('\t'))
    parent=cols[0]
    child=cols[1]
    if not parent in link:
        root[parent]=1
    if child in root:
        del root[child]
    if not child in link:
        link[child]={}
    if not parent in link:
        link[parent]={}
    link[parent][child]=1

Now I intend to print the desired output using two dict created earlier (root and link). I am not sure how to go about doing this in python. But I know that we could write following in perl to achieve the result:

print_links($_) for sort keys %root;

sub print_links
{
  my @path = @_;

  my %children = %{$link{$path[-1]}};
  if (%children)
  {
    print_links(@path, $_) for sort keys %children;
  } 
  else 
  {
    say join "\t", @path;
  }
}

Could you please help me achieve the desired output in python 3.x?

like image 297
Sachin S Avatar asked May 29 '17 06:05

Sachin S


2 Answers

I see here next problems:

  • reading relations from file;
  • building hierarchy from relations.
  • writing hierarchy to file.

Assuming that height of hierarchy tree is less than default recursion limit (equals to 1000 in most cases), let's define utility functions for this separate tasks.

Utilities

  1. Parsing of relations can be done with

    def parse_relations(lines):
        relations = {}
        splitted_lines = (line.split() for line in lines)
        for parent, child in splitted_lines:
            relations.setdefault(parent, []).append(child)
        return relations
    
  2. Building hierarchy can be done with

    • Python >=3.5

      def flatten_hierarchy(relations, parent='Earth'):
          try:
              children = relations[parent]
              for child in children:
                  sub_hierarchy = flatten_hierarchy(relations, child)
                  for element in sub_hierarchy:
                      try:
                          yield (parent, *element)
                      except TypeError:
                          # we've tried to unpack `None` value,
                          # it means that no successors left
                          yield (parent, child)
          except KeyError:
              # we've reached end of hierarchy
              yield None
      
    • Python <3.5: extended iterable unpacking was added with PEP-448, but it can be replaced with itertools.chain like

      import itertools
      
      
      def flatten_hierarchy(relations, parent='Earth'):
          try:
              children = relations[parent]
              for child in children:
                  sub_hierarchy = flatten_hierarchy(relations, child)
                  for element in sub_hierarchy:
                      try:
                          yield tuple(itertools.chain([parent], element))
                      except TypeError:
                          # we've tried to unpack `None` value,
                          # it means that no successors left
                          yield (parent, child)
          except KeyError:
              # we've reached end of hierarchy
              yield None
      
  3. Hierarchy export to file can be done with

    def write_hierarchy(hierarchy, path, delimiter='\t'):
        with open(path, mode='w') as file:
            for row in hierarchy:
                file.write(delimiter.join(row) + '\n')
    

Usage

Assuming that file path is 'relations.txt':

with open('relations.txt') as file:
    relations = parse_relations(file)

gives us

>>> relations
{'Asia': ['Srilanka', 'India'],
 'Srilanka': ['Colombo'],
 'Continents': ['Europe', 'Asia'],
 'India': ['Mumbai', 'Pune'],
 'Earth': ['Continents']}

and our hierarchy is

>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
 ('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
 ('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
 ('Earth', 'Continents', 'Asia', 'India', 'Pune')]

finally export it to file called 'hierarchy.txt':

>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')

(we use sorted to get hierarchy like in your desired output file)

P. S.

If you are not familiar with Python generators we can define flatten_hierarchy function like

  • Python >= 3.5

    def flatten_hierarchy(relations, parent='Earth'):
        try:
            children = relations[parent]
        except KeyError:
            # we've reached end of hierarchy
            return None
        result = []
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            try:
                for element in sub_hierarchy:
                    result.append((parent, *element))
            except TypeError:
                # we've tried to iterate through `None` value,
                # it means that no successors left
                result.append((parent, child))
        return result
    
  • Python < 3.5

    import itertools
    
    
    def flatten_hierarchy(relations, parent='Earth'):
        try:
            children = relations[parent]
        except KeyError:
            # we've reached end of hierarchy
            return None
        result = []
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            try:
                for element in sub_hierarchy:
                    result.append(tuple(itertools.chain([parent], element)))
            except TypeError:
                # we've tried to iterate through `None` value,
                # it means that no successors left
                result.append((parent, child))
        return result
    
like image 157
Azat Ibrakov Avatar answered Nov 15 '22 00:11

Azat Ibrakov


With Simple Steps, we can do this,

  • Step 1: Convert the data to Dataframe,
  • Step 2: Take a unique element from column 1 which is not in column 2,
  • Step 3: After taking the unique element from column 1, Convert column 1 as Dataframe,
  • Step 4: Merge the Dataframes, by using pd.merge(), Left data frame as a unique element from column 1, Right data frame as Main Data Which we convert in Step1,
  • Step 5: Drop_duplicates by all columns
like image 33
sharuk khan Avatar answered Nov 15 '22 00:11

sharuk khan