Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform a dataframe for network graphing

I have a dataframe like so:

ID  | Node 1 | Node 2 | Node 3
a   |   1    |    0   |   1
b   |   0    |    1   |   1
c   |   1    |    0   |   0
d   |   1    |    1   |   1
e   |   0    |    1   |   1

I want to change it so that I can turn it into a network chart, where connections between nodes are the amount of times an ID is indicated for both of them:

Node A | Node B | Weight |
Node 1 | Node 2 |    1   |
Node 1 | Node 3 |    2   |
Node 2 | Node 3 |    3   |
like image 664
NBC Avatar asked May 25 '26 09:05

NBC


2 Answers

Building on Tai's solution, you could obtain the desired DataFrame using

import numpy as np
import pandas as pd

def get_weights(df):
    df2 = df.filter(regex='Node')
    nodes = df2.columns
    arr = df2.values
    m = np.dot(arr.T, arr).astype(float)
    idx = np.tril_indices(m.shape[0])   
    m[idx] = np.nan
    result = pd.DataFrame(m, columns=nodes, index=nodes)
    result = result.stack()
    result = result.astype(int)
    result = result.reset_index()
    result.columns = ['Node A', 'Node B', 'Weights']
    return result

df = pd.DataFrame({'ID': ['a', 'b', 'c', 'd', 'e'],
 'Node 1': [1, 0, 1, 0, 0],
 'Node 2': [0, 1, 0, 1, 1],
 'Node 3': [1, 1, 0, 1, 1]})
result = get_weights(df)
print(result)

which yields

   Node A  Node B  Weight
0  Node 1  Node 2       1
1  Node 1  Node 3       2
2  Node 2  Node 3       3
like image 156
unutbu Avatar answered May 26 '26 21:05

unutbu


Instead of having a edge-list form

Node A | Node B | Weight |
Node 1 | Node 2 |    1   |
Node 1 | Node 3 |    2   |
Node 2 | Node 3 |    3   |

you can also calculate a co-occurance/adjancency matrix to represent the relationship you are interested. It can be constructed using a dot product. alko's already gave an answer in pandas in Constructing a co-occurrence matrix in python pandas

I modify alko's answer using numpy

m = df.values.T.dot(df.values)
np.fill_diagonal(m, 0)

# array([[0, 1, 2],
#       [1, 0, 3],
#       [2, 3, 0]])
# You can use nx.from_numpy_matrix to construct a graph
# m[i, j] is the number of co-occurance between node i and node j.

One part that I am not fond of alko's answer is that it tries to change the diagonal part of an dataframe, say df, by changing df.values. Changing df.values directly to change df should not be promoted because sometimes df.values returns a copy while sometimes a view. See my previous question Will changes in DataFrame.values always modify the values in the data frame? for more information.

If one want to follow alko's pandas method, one can replace np.fill_diagonal(df.values, 0) with

df = df - np.eye(len(df)) * np.diagonal(df)

like image 42
Tai Avatar answered May 26 '26 21:05

Tai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!