Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas differing value_counts() in two columns of same len()

I have a pandas data frame that contains two columns, with trace numbers [col_1] and ID numbers [col_2]. Trace numbers can be duplicates, as can ID numbers - however, each trace & ID should correspond only a specific fellow in the adjacent column.

Each of my two columns are the same length, but have different unique value counts, which should be the same, as shown below:

in[1]:  Trace | ID
        1     | 5054
        2     | 8291
        3     | 9323
        4     | 9323
        ...   |
        100   | 8928

in[2]:  print('unique traces: ', df['Trace'].value_counts())
        print('unique IDs: ', df['ID'].value_counts())

out[3]: unique traces: 100
        unique IDs: 99

In the code above, the same ID number (9232) is represented by two Trace numbers (3 & 4) - how can I isolate these incidences? Thanks for looking!

like image 489
tmdangerous Avatar asked Mar 01 '26 09:03

tmdangerous


1 Answers

By using the duplicated() function (docs), you can do the following:

df[df['ID'].duplicated(keep=False)]

By setting keep to False, we get all the duplicates (instead of excluding the first or the last one).

Which returns:

Trace   ID
2   3   9323
3   4   9323
like image 177
DocZerø Avatar answered Mar 02 '26 23:03

DocZerø



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!