Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas count no order pairs

Tags:

python

pandas

I have a pandas data frame looks like:

df = pd.DataFrame(data = {
    'v1': ['a', 'a', 'c', 'b', 'd', 'c', 'd', 'c', 'f', 'e'], 
    'v2': ['b', 'b', 'd', 'a', 'c', 'e', 'c', 'd', 'g', 'c'], 
    'v3': range(0,10)})

  v1 v2  v3
0  a  b   0
1  a  b   1
2  c  d   2
3  a  b   3
4  c  d   4
5  c  e   5
6  c  d   6
7  c  d   7
8  f  g   8
9  c  e   9

I want to get result looks like:

  v1 v2  count
0  a  b      2
1  c  d      3
2  c  e      2
3  f  g      1

The first row "a b 2", comes from "a b", "a b", "b a". as "a b" appears continuously twice, count only once.

The second row "c d 3" comes from "c d", "d c", "d c", "c d", as "d c" appears continuously twice

The third row "c e 2" comes from "c e", "e c"

The last row "f g 1" comes from "f g"

My initial thought is to use groupby, but I don't know how to skip the continuous appearance.

like image 966
Hanbo Avatar asked Jan 27 '26 19:01

Hanbo


1 Answers

Sort the first two columns, drop consecutive duplicates, and then count them:

df.iloc[:, :2] = np.sort(df.iloc[:, :2], axis=1)

m = ~df.iloc[:, :2].ne(df.iloc[:, :2].shift()).cumsum().duplicated()
df[m].groupby(['v1', 'v2'], as_index=False).count()

  v1 v2  v3
0  a  b   2
1  c  d   3
2  c  e   2
3  f  g   1
like image 174
cs95 Avatar answered Jan 30 '26 08:01

cs95



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!