I have a dataframe of the form:
index Name_A Name_B
0 Adam Ben
1 Chris David
2 Adam Chris
3 Ben Chris
And I'd like to obtain the adjacency matrix for Name_A
and Name_B
, ie:
Adam Ben Chris David
Adam 0 1 1 0
Ben 0 0 1 0
Chris 0 0 0 1
David 0 0 0 0
What is the most pythonic/scaleable way of tackling this?
EDIT: Also, I know that if the row Adam, Ben
is in the dataset, then at some other point, Ben, Adam
will also be in the dataset.
2. Adjacency matrix. The row and column indices represent the vertices: m a t r i x [ i ] [ j ] = 1 matrix[i][j] = 1 matrix[i][j]=1 means that there is an edge from vertices i to j, and m a t r i x [ i ] [ j ] = 0 matrix[i][j] = 0 matrix[i][j]=0 denotes that there is no edge between i and j.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
Pandas DataFrame diff() Method The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
You can use crosstab
and then reindex
by union
of column and index values:
df = pd.crosstab(df.Name_A, df.Name_B)
print (df)
Name_B Ben Chris David
Name_A
Adam 1 1 0
Ben 0 1 0
Chris 0 0 1
df = pd.crosstab(df.Name_A, df.Name_B)
idx = df.columns.union(df.index)
df = df.reindex(index = idx, columns=idx, fill_value=0)
print (df)
Adam Ben Chris David
Adam 0 1 1 0
Ben 0 0 1 0
Chris 0 0 0 1
David 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With