Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to visualize (dendrogram) a dictionary of hierarchical items?

This is my first time of doing visualization from hierarchical data in dictionary format with Python. Last part of the data looks like this:

d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']}

So I have indices on lists referring back to the keys (index) of the dictionary. I suppose this could be used as a base structure for the visualization, please correct me if I'm wrong. Characters on the data are "end nodes/leaves" which doesn't refer back to any index.

I have found NetworkX which possibly could be used for visualization, but I have no idea where to start with it and my data. I was hoping it would be something as simple as:

import networkx as nx
import matplotlib.pyplot as plt

d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']}

G = nx.Graph(d)
nx.draw(G)
plt.show()

I'm looking for hierarchical dendrogram or some sort of clustering as an output. Sorry at this point I'm not totally sure what would be the best visualization, maybe similar to this:

Dendrogram

UPDATE

Using NetworkX actually was very simple. I'm providing other simple sample data and looking for an answer if it can be visualized by dendrogram also instead of wired network graph?

# original sequence: a,b,c,d,b,c,a,b,c,d,b,c
d = {^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]}
G = nx.Graph(d)
nx.draw_spring(G, node_size=300, with_labels=True)

NetworkX draw_spring nx.Graph

As we can see, graph show plain relations, but not hierarchy and order of the data what I'm willing to do. DiGraph gives more details, but it is still not possible to construct original sequence from it:

NetworkX draw_spring nx.DiGraph

For dendrogram apparently weight and end nodes needs to be calculated as pointed out on the first answer. For that approach data structure could be something like this:

d = {'a': [], 'b': [], 'c': [], 'd': [], ^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]}
like image 755
MarkokraM Avatar asked Feb 18 '16 19:02

MarkokraM


People also ask

How do you display a dendrogram?

In addition to creating a plot, the dendrogram function returns a dictionary (they call it R in the docs) containing several lists. The leaf_label_func you create must take in a value from R["leaves"] and return the desired label. The easiest way to set labels is to run dendrogram twice.

How do you make a dendrogram in R?

As you already know, the standard R function plot. hclust() can be used to draw a dendrogram from the results of hierarchical clustering analyses (computed using hclust() function). A simplified format is: plot(x, labels = NULL, hang = 0.1, main = "Cluster dendrogram", sub = NULL, xlab = NULL, ylab = "Height", ...)

What is dendrogram in cluster analysis?

A dendrogram is a branching diagram that represents the relationships of similarity among a group of entities.

What is dendrogram in bioinformatics?

A dendrogram (from Greek dendro "tree" and gramma "drawing") is a tree diagram widely used to illustrate the arrangement of the clusters produced by hierarchical clustering. The hierarchical clustering algorithms begin with each object in individual clusters.


1 Answers

One idea is to use SciPy's dendrogram function to draw your dendrogram. To do so, you just need to create the linkage matrix Z, which is described in the documentation of the SciPy linkage function. Each row [x, y, w, z] of the linkage matrix Z describes the weight w at which x and y merge to form a rooted subtree with z leaves.

To demonstrate, I've created a simple example using a small dendrogram with three leaves, shown below: example dendrogram

I created this visualization with the following code:

# Load required modules
import networkx as nx
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram

# Construct the graph/hierarchy
d           = { 0: [1, 'd'], 1: ['a', 'b', 'c'], 'a': [], 'b': [], 'c': [], 'd': []}
G           = nx.DiGraph(d)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]

# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:
    children = set()
    node_list = list(d[u])
    while len(node_list) > 0:
        v = node_list.pop(0)
        children.add( v )
        node_list += d[v]

    subtree[u] = sorted(children & leaves)

inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last

# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):
    children = d[n]
    x = children[0]
    for y in children[1:]:
        z = tuple(subtree[x] + subtree[y])
        i, j = index[tuple(subtree[x])], index[tuple(subtree[y])]
        Z.append([i, j, float(len(subtree[n])), len(z)]) # <-- float is required by the dendrogram function
        index[z] = k
        subtree[z] = list(z)
        x = z
        k += 1

# Visualize
dendrogram(Z, labels=leaves)
plt.show()

There are a few key items to note:

  1. Give the d data structure, I use a NetworkX directed graph (DiGraph) . The directionality is important so we can determine which nodes are leaves (no children -> out degree of zero) and inner_nodes (two or more children -> non-zero out degree).
  2. Usually there is some weight associated with each edge in your dendrogram, but there weren't any weights in your example. Instead, I used the number of leaves in the subtree rooted at each internal node n as the weight for n.
  3. If an inner node has more than two children, you have to add "dummy" internal nodes, since each row of the linkage matrix merges two nodes together. This is why I write for y in children[1:]:, etc.

I'm guessing you may be able to simplify this code based on what your data looks like before creating the d in your example, so this may be more of a proof of concept.

like image 96
mdml Avatar answered Sep 28 '22 23:09

mdml