Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Controlling Sankey diagram connections

I'm trying to control which flows connect to each other using the Matplotlib Sankey diagram. I'm modifying the basic two systems example.

I think my confusion comes down to misunderstanding what this actually means:

Notice that only one connection is specified, but the systems form a circuit since: (1) the lengths of the paths are justified and (2) the orientation and ordering of the flows is mirrored.

I've made a toy example that uses a single data set and then modifies it for the second systems to make sure that the numbers all match up.

import numpy as np
import matplotlib.pyplot as plt

from matplotlib.sankey import Sankey

plt.rcParams["figure.figsize"] = (15,10)


system_1 = [
    {"label": "1st",  "value":  2.00, "orientation":  0},
    {"label": "2nd",  "value":  0.15, "orientation": -1},
    {"label": "3rd",  "value":  0.60, "orientation": -1},
    {"label": "4th",  "value": -0.10, "orientation": -1},
    {"label": "5th",  "value":  0.25, "orientation": -1},
    {"label": "6th",  "value":  0.25, "orientation": -1},
    {"label": "7th",  "value":  0.25, "orientation": -1},
    {"label": "8th",  "value":  0.25, "orientation": -1},
    {"label": "9th",  "value":  0.25, "orientation": -1}
]

system_2 = system_1[:4]
system_2.append({"label": "new",  "value":  -0.25, "orientation": 1})


fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Where are all my cows?")
flows  = [x["value"] for x in system_1]
labels = [x["label"] for x in system_1]
orientations=[x["orientation"] for x in system_1]
sankey = Sankey(ax=ax, unit="cow")
sankey.add(flows=flows, 
           labels=labels,
           label='one',
           orientations=orientations)

sankey.add(flows=[-x["value"] for x in system_2], 
           labels=[x["label"] for x in system_2],
           label='two',
           orientations=[-x["orientation"] for x in system_2], 
           prior=0, 
           connect= (0,0)
          )

diagrams = sankey.finish()
diagrams[-1].patch.set_hatch('/')
plt.legend(loc='best')


plt.show()

This gives me:

A sankey diagram that doesn't really work

It should join up the flows with matching labels.

I've read this and this but they aren't helping me understand what is actually happening.

like image 428
Ben Avatar asked Mar 06 '18 22:03

Ben


1 Answers

Let's start by trying to solve the confusion

I think my confusion comes down to misunderstanding what this actually means:

Notice that only one connection is specified, but the systems form a circuit since: (1) the lengths of the paths are justified and (2) the orientation and ordering of the flows is mirrored.

(2) The orientation and ordering of the flows is mirrored.

The thing you probably understood wrong is the meaning of mirrored, which is indeed confusing in this case. One would think, that mirrored equals inverted, but this is only partly true:
The flows (or as you call it in your code: values) must be inverted, this one you got right. Because the values corresponds to the inputs(value > 0) or outputs(value < 0). And only an output can be connected to an input, and vice versa.

But the orientation must be the same for both flows which you try to connect. This one isn't inverted, but it still needs to be "mirrored". What is meant with that? Well, if an I/O is looking in the direction of the arrow he came from, it needs to see another I/O (as when looking in a mirror), only then they can connect themselves. It's not so easy to explain as a non-native speaker, but I'll try to illustrate the idea:

Able to connect:         Not able to connect:        Not able to connect:
I/O  Mirror  I/O         I/O  Mirror  I/O            I/O  Mirror  I/O
╚══>   |    >══╝          ╗     |      ╔                    |      ║
                          ║     |      ║             ══>    |      ║
                          v     |      ^                    |      ^

In your code, you have inverted the orientation. That's why for example the 3rd flow of the orange system is in the top left corner, and its counterpart from the blue system is in the bottom right. There is no way, that these I/Os will ever be able to "see" each other.

You can revert the inverting of the second system by deleting the - which prepends the x in orientations:

orientations=[x["orientation"] for x in system_2]

You'll see that now the flows are close to each other, but you're in a situation like shown in the Not able to connect-illustration (No. 2). This means the structure of your diagram won't be able to work that way. You can bend the single flows only in these three directions: -90°, 0° or 90°. Which correspondents to orientations = -1, 0 or 1. The only way to connect those flows directly is to set their orientation=0, but it seems to me that this is not your goal.

You need a new approach for the task so you won't get in a situation like the one before, where you can't connect the flows anymore. I have modified your code to (maybe?) reach your goal. It doesn't look the same anymore, but I think it's a good start to get the concept about the orientations and the mirroring and all that stuff.

(1) The lengths of the paths are justified.

You'll see that in my code below, I have set values for the pathlengths variable (in the second system). I have made the expirience that if you have too much flows which need to be connected, matplotlib isn't able to do it automatically anymore.

Code and Output

import numpy as np
import matplotlib.pyplot as plt

from matplotlib.sankey import Sankey

plt.rcParams["figure.figsize"] = (15,10)


system_1 = [
    {"label": "1st",  "value": -2.00, "orientation":  1},
    {"label": "4th",  "value":  0.10, "orientation":  1},
    {"label": "2nd",  "value":  0.15, "orientation":  1},
    {"label": "3rd",  "value":  0.60, "orientation":  1},
    {"label": "5th",  "value":  0.25, "orientation": -1},
    {"label": "6th",  "value":  0.25, "orientation": -1},
    {"label": "7th",  "value":  0.25, "orientation":  1},
    {"label": "8th",  "value":  0.25, "orientation":  1},
    {"label": "9th",  "value":  0.25, "orientation":  0}
]

system_2 = [
    {"label": "1st",  "value":  2.00, "orientation":  1},
    {"label": "4th",  "value": -0.10, "orientation":  1},
    {"label": "2nd",  "value": -0.15, "orientation":  1},
    {"label": "3rd",  "value": -0.60, "orientation":  1},
    {"label": "new",  "value": -0.25, "orientation":  1}
]

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Where are all my cows?")

flows_1  = [x["value"] for x in system_1]
labels_1 = [x["label"] for x in system_1]
orientations_1=[x["orientation"] for x in system_1]

flows_2  = [x["value"] for x in system_2]
labels_2 = [x["label"] for x in system_2]
orientations_2=[x["orientation"] for x in system_2]

sankey = Sankey(ax=ax, unit=None)
sankey.add(flows=flows_1, 
           labels=labels_1,
           label='one',
           orientations=orientations_1)

sankey.add(flows=flows_2, 
           labels=labels_2,
           label='two',
           orientations=orientations_2,
           pathlengths=[0, 0.4, 0.5, 0.65, 1.25],
           prior=0,
           connect=(0,0))

diagrams = sankey.finish()
diagrams[-1].patch.set_hatch('|')
diagrams[-0].patch.set_hatch('-')
plt.legend(loc='best')


plt.show()

Output

like image 196
V. L. Avatar answered Nov 17 '22 21:11

V. L.