I am trying to make a directed graph or Sankey diagram (any would work) for customer state migration. Data looks like below, count means the number of users migrating from the current state to next state.
**current_state next_state count**
New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673
I have written a code that builds a sankey, but the plot is not easily readable. Looking for a readable directed graph. Here is my code:
df = pd.read_csv('input.csv')
x = list(set(df.current_state.values) | set(df.next_state))
di = dict()
count = 0
for i in x:
di[i] = count
count += 1
#
df['source'] = df['current_state'].apply(lambda y : di[y])
df['target'] = df['next_state'].apply(lambda y : di[y])
#
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = x,
color = "blue"
),
link = dict(
source = df.source,
target = df.target,
value = df['count']
))])
#
fig.update_layout(title_text="Sankey Diagram", font_size=10, autosize=False,
width=1000,
height=1000,
margin=go.layout.Margin(
l=50,
r=50,
b=100,
t=100,
pad=4
))
fig.show()
For directed graphs, graphviz
would be my tool of choice instead of Python.
The following script txt2dot.py
converts your data into an input file for graphviz:
text = '''New Profile Initiated 37715
Profile Initiated End 36411
JobRecommended End 6202
New End 6171
ProfileCreated JobRecommended 5799
Profile Initiated ProfileCreated 4360
New NotOpted 3751
NotOpted Profile Initiated 2817
JobRecommended InterestedInJob 2542
IntentDetected ProfileCreated 2334
ProfileCreated IntentDetected 1839
InterestedInJob Applied 1671
JobRecommended NotInterestedInJob 1477
NotInterestedInJob ProfileCreated 1408
IntentDetected End 1325
NotOpted End 1009
InterestedInJob ProfileCreated 975
Applied IntentDetected 912
NotInterestedInJob IntentDetected 720
Applied ProfileCreated 701
InterestedInJob End 673'''
# Remove ambiguity and make suitable for graphviz.
text = text.replace('New Profile', 'NewProfile')
text = text.replace('New ', 'NewProfile ')
text = text.replace('Profile Initiated', 'ProfileInitiated')
text = text.replace(' Initiated', ' ProfileInitiated')
# Create edges and nodes for graphviz.
edges = [ln.split() for ln in text.splitlines()]
edges = sorted(edges, key=lambda x: -1*int(x[2]))
nodes = sorted(list(set(i[0] for i in edges) | set(i[1] for i in edges)))
print('digraph foo {')
for n in nodes:
print(f' {n};')
print()
for item in edges:
print(' ', item[0], ' -> ', item[1], ' [label="', item[2], '"];', sep='')
print('}')
Running python3 txt2dot.py > foo.dot
results in:
digraph foo {
Applied;
End;
IntentDetected;
InterestedInJob;
JobRecommended;
NewProfile;
NotInterestedInJob;
NotOpted;
ProfileCreated;
ProfileInitiated;
NewProfile -> ProfileInitiated [label="37715"];
ProfileInitiated -> End [label="36411"];
JobRecommended -> End [label="6202"];
NewProfile -> End [label="6171"];
ProfileCreated -> JobRecommended [label="5799"];
ProfileInitiated -> ProfileCreated [label="4360"];
NewProfile -> NotOpted [label="3751"];
NotOpted -> ProfileInitiated [label="2817"];
JobRecommended -> InterestedInJob [label="2542"];
IntentDetected -> ProfileCreated [label="2334"];
ProfileCreated -> IntentDetected [label="1839"];
InterestedInJob -> Applied [label="1671"];
JobRecommended -> NotInterestedInJob [label="1477"];
NotInterestedInJob -> ProfileCreated [label="1408"];
IntentDetected -> End [label="1325"];
NotOpted -> End [label="1009"];
InterestedInJob -> ProfileCreated [label="975"];
Applied -> IntentDetected [label="912"];
NotInterestedInJob -> IntentDetected [label="720"];
Applied -> ProfileCreated [label="701"];
InterestedInJob -> End [label="673"];
}
Running dot -o foo.png -Tpng foo.dot
gives:
This creates a basic Sankey Diagram, assuming you:
2 and 3 are easily doable with any non-prehistoric text editor, or even python itself, if it's a lot of data. I strongly recommend you avoid working with whitespaces in unquoted values.
Result
import plotly.graph_objects as go
import numpy as np
import matplotlib
if __name__ == '__main__':
with open('state_migration.csv', 'r') as finput:
info = [[ _ for _ in _.strip().lower().split(',') ]
for _ in finput.readlines()[1:]]
info_t = [*map(list,zip(*info))] # info transposed
# this exists to map the data to plotly's node indexing format
index = {n: i for i, n in enumerate(set(info_t[0]+info_t[1]))}
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 15,
thickness = 20,
line = dict(color = "black", width = 0.5),
label = list(index.keys()),
color = np.random.choice( list(matplotlib.colors.cnames.values()),
size=len(index.keys()), replace=False )
),
link = dict(
source = [index[_] for _ in info_t[0]],
target = [index[_] for _ in info_t[1]],
value = info_t[2]
))])
fig.update_layout(title_text="State Migration", font_size=12)
fig.show()
You can drag the nodes around. See this if you want to predefine their positions or check other parameters.
The data I used was a cleaned version of your input:
currentstate,next_state,count
new,initiated,37715
profileinitiated,end,36411
jobrecommended,end,6202
new,end,6171
profilecreated,jobrecommended,5799
profileinitiated,profilecreated,4360
new,notopted,3751
notopted,profileinitiated,2817
jobrecommended,interestedinjob,2542
intentdetected,profilecreated,2334
profilecreated,intentdetected,1839
interestedinjob,applied,1671
jobrecommended,notinterestedinjob,1477
notinterestedinjob,profilecreated,1408
intentdetected,end,1325
notopted,end,1009
interestedinjob,profilecreated,975
applied,intentdetected,912
notinterestedinjob,intentdetected,720
applied,profilecreated,701
interestedinjob,end,673
I changed "New Profile" to the existing state "New", since the diagram was otherwise weird. Feel free to tweak as you need.
The libraries I used are absolutely not needed for what you want, I'm simply more familiar with them. For the directed graph, Roland Smith has you covered. It can also be done with Plotly, see their gallery
Tested on Python 3.8.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With