I have a pandas data frame that contains two columns, with trace numbers [col_1] and ID numbers [col_2]. Trace numbers can be duplicates, as can ID numbers - however, each trace & ID should correspond only a specific fellow in the adjacent column.
Each of my two columns are the same length, but have different unique value counts, which should be the same, as shown below:
in[1]: Trace | ID
1 | 5054
2 | 8291
3 | 9323
4 | 9323
... |
100 | 8928
in[2]: print('unique traces: ', df['Trace'].value_counts())
print('unique IDs: ', df['ID'].value_counts())
out[3]: unique traces: 100
unique IDs: 99
In the code above, the same ID number (9232) is represented by two Trace numbers (3 & 4) - how can I isolate these incidences? Thanks for looking!
By using the duplicated() function (docs), you can do the following:
df[df['ID'].duplicated(keep=False)]
By setting keep to False, we get all the duplicates (instead of excluding the first or the last one).
Which returns:
Trace ID
2 3 9323
3 4 9323
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With