Cross tabulate counts between pairs of keywords per group with pandas

Question

I have a table with keywords associated with articles, looks like this:

article_id  keyword
1           A
1           B
1           C
2           A
2           B
2           D
3           E
3           F
3           D

I need to get a sort of a pivot table:

    A   B   C   D   E   F
A   -   2   1   1   0   0
B   -   -   1   1   0   0
C   -   -   -   0   0   0
D   -   -   -   -   1   1
E   -   -   -   -   -   1
F   -   -   -   -   -   -

It means, that the pair (A, B) occurs in two articles (#1 and #2), the pair (A, C) occurs in just one article (#1), etc.

What is the most Pythonic way to do that?

I tried Pandas pivot tables, but with no success so far. Just can't get how to connect the keywords and article ids.

This question Create adjacency matrix for two columns in pandas dataframe doesn't solve the problem.

cs95 · Accepted Answer

Use crosstab and dot. You can then use np.triu to retain only the upper half of the matrix (everything else is set to 0).

u = pd.crosstab(df.article_id, df.keyword)
v = u.T.dot(u)
pd.DataFrame(np.triu(v, k=1), index=v.index.values, columns=v.columns.values)

   A  B  C  D  E  F
A  0  2  1  1  0  0
B  0  0  1  1  0  0
C  0  0  0  0  0  0
D  0  0  0  0  1  1
E  0  0  0  0  0  1
F  0  0  0  0  0  0

Alternatively, for the last step, you can set invalid values to "-1", as a better alternative to "-" for invalid values.

v.values[np.tril_indices_from(v)] = -1
print(v)

keyword  A  B  C  D  E  F
keyword                  
A       -1  2  1  1  0  0
B       -1 -1  1  1  0  0
C       -1 -1 -1  0  0  0
D       -1 -1 -1 -1  1  1
E       -1 -1 -1 -1 -1  1
F       -1 -1 -1 -1 -1 -1

Cross tabulate counts between pairs of keywords per group with pandas

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Ildar Akhmetov

1 Answers

cs95

Recent Activity

Donate For Us

Cross tabulate counts between pairs of keywords per group with pandas

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Ildar Akhmetov

1 Answers

cs95

Related questions

Recent Activity

Donate For Us