Given the following unordered tab delimited file:
Asia Srilanka
Srilanka Colombo
Continents Europe
India Mumbai
India Pune
Continents Asia
Earth Continents
Asia India
The goal is to generate the following output (tab delimited):
Earth Continents Asia India Mumbai
Earth Continents Asia India Pune
Earth Continents Asia Srilanka Colombo
Earth Continents Europe
I have created the following script to achieve the goal:
root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children
for line in f:
line=line.rstrip('\r\n')
line=line.strip()
cols=list(line.split('\t'))
parent=cols[0]
child=cols[1]
if not parent in link:
root[parent]=1
if child in root:
del root[child]
if not child in link:
link[child]={}
if not parent in link:
link[parent]={}
link[parent][child]=1
Now I intend to print the desired output using two dict created earlier (root and link). I am not sure how to go about doing this in python. But I know that we could write following in perl to achieve the result:
print_links($_) for sort keys %root;
sub print_links
{
my @path = @_;
my %children = %{$link{$path[-1]}};
if (%children)
{
print_links(@path, $_) for sort keys %children;
}
else
{
say join "\t", @path;
}
}
Could you please help me achieve the desired output in python 3.x?
I see here next problems:
Assuming that height of hierarchy tree is less than default recursion limit (equals to 1000
in most cases), let's define utility functions for this separate tasks.
Parsing of relations can be done with
def parse_relations(lines):
relations = {}
splitted_lines = (line.split() for line in lines)
for parent, child in splitted_lines:
relations.setdefault(parent, []).append(child)
return relations
Building hierarchy can be done with
Python >=3.5
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
for element in sub_hierarchy:
try:
yield (parent, *element)
except TypeError:
# we've tried to unpack `None` value,
# it means that no successors left
yield (parent, child)
except KeyError:
# we've reached end of hierarchy
yield None
Python <3.5: extended iterable unpacking was added with PEP-448, but it can be replaced with itertools.chain
like
import itertools
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
for element in sub_hierarchy:
try:
yield tuple(itertools.chain([parent], element))
except TypeError:
# we've tried to unpack `None` value,
# it means that no successors left
yield (parent, child)
except KeyError:
# we've reached end of hierarchy
yield None
Hierarchy export to file can be done with
def write_hierarchy(hierarchy, path, delimiter='\t'):
with open(path, mode='w') as file:
for row in hierarchy:
file.write(delimiter.join(row) + '\n')
Assuming that file path is 'relations.txt'
:
with open('relations.txt') as file:
relations = parse_relations(file)
gives us
>>> relations
{'Asia': ['Srilanka', 'India'],
'Srilanka': ['Colombo'],
'Continents': ['Europe', 'Asia'],
'India': ['Mumbai', 'Pune'],
'Earth': ['Continents']}
and our hierarchy is
>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
('Earth', 'Continents', 'Asia', 'India', 'Pune')]
finally export it to file called 'hierarchy.txt'
:
>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')
(we use sorted
to get hierarchy like in your desired output file)
If you are not familiar with Python
generators we can define flatten_hierarchy
function like
Python >= 3.5
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
except KeyError:
# we've reached end of hierarchy
return None
result = []
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
try:
for element in sub_hierarchy:
result.append((parent, *element))
except TypeError:
# we've tried to iterate through `None` value,
# it means that no successors left
result.append((parent, child))
return result
Python < 3.5
import itertools
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
except KeyError:
# we've reached end of hierarchy
return None
result = []
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
try:
for element in sub_hierarchy:
result.append(tuple(itertools.chain([parent], element)))
except TypeError:
# we've tried to iterate through `None` value,
# it means that no successors left
result.append((parent, child))
return result
With Simple Steps, we can do this,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With