I have a dataframe like so:
ID | Node 1 | Node 2 | Node 3
a | 1 | 0 | 1
b | 0 | 1 | 1
c | 1 | 0 | 0
d | 1 | 1 | 1
e | 0 | 1 | 1
I want to change it so that I can turn it into a network chart, where connections between nodes are the amount of times an ID is indicated for both of them:
Node A | Node B | Weight |
Node 1 | Node 2 | 1 |
Node 1 | Node 3 | 2 |
Node 2 | Node 3 | 3 |
Building on Tai's solution, you could obtain the desired DataFrame using
import numpy as np
import pandas as pd
def get_weights(df):
df2 = df.filter(regex='Node')
nodes = df2.columns
arr = df2.values
m = np.dot(arr.T, arr).astype(float)
idx = np.tril_indices(m.shape[0])
m[idx] = np.nan
result = pd.DataFrame(m, columns=nodes, index=nodes)
result = result.stack()
result = result.astype(int)
result = result.reset_index()
result.columns = ['Node A', 'Node B', 'Weights']
return result
df = pd.DataFrame({'ID': ['a', 'b', 'c', 'd', 'e'],
'Node 1': [1, 0, 1, 0, 0],
'Node 2': [0, 1, 0, 1, 1],
'Node 3': [1, 1, 0, 1, 1]})
result = get_weights(df)
print(result)
which yields
Node A Node B Weight
0 Node 1 Node 2 1
1 Node 1 Node 3 2
2 Node 2 Node 3 3
Instead of having a edge-list form
Node A | Node B | Weight |
Node 1 | Node 2 | 1 |
Node 1 | Node 3 | 2 |
Node 2 | Node 3 | 3 |
you can also calculate a co-occurance/adjancency matrix to represent the relationship you are interested. It can be constructed using a dot product. alko's already gave an answer in pandas in Constructing a co-occurrence matrix in python pandas
I modify alko's answer using numpy
m = df.values.T.dot(df.values)
np.fill_diagonal(m, 0)
# array([[0, 1, 2],
# [1, 0, 3],
# [2, 3, 0]])
# You can use nx.from_numpy_matrix to construct a graph
# m[i, j] is the number of co-occurance between node i and node j.
One part that I am not fond of alko's answer is that it tries to change the diagonal part of an dataframe, say df, by changing df.values. Changing df.values directly to change df should not be promoted because sometimes df.values returns a copy while sometimes a view. See my previous question Will changes in DataFrame.values always modify the values in the data frame? for more information.
If one want to follow alko's pandas method, one can replace np.fill_diagonal(df.values, 0) with
df = df - np.eye(len(df)) * np.diagonal(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With