I have a DataFrame kinda like this:
| index | col_1 | col_2 |
| 0 | A | 11 |
| 1 | B | 12 |
| 2 | B | 12 |
| 3 | C | 13 |
| 4 | C | 13 |
| 5 | C | 14 |
where col_1
and col_2
may not always be one-to-one due to corrupt data.
How can I use Pandas to determine which rows have col_1
and col_2
entries that violate this one-to-one relationship?
In this case it would be the last three rows since C can either map to 13 or 14.
To select a single value from the DataFrame, you can do the following. You can use slicing to select a particular column. To select rows and columns simultaneously, you need to understand the use of comma in the square brackets.
You could use a transform, counting the length of unique objects in each group. First look at the subset of just these columns, and then groupby a single column:
In [11]: g = df[['col1', 'col2']].groupby('col1')
In [12]: counts = g.transform(lambda x: len(x.unique()))
In [13]: counts
Out[13]:
col2
0 1
1 1
2 1
3 2
4 2
5 2
The columns for the remaining columns (if not all)
In [14]: (counts == 1).all(axis=1)
Out[14]:
0 True
1 True
2 True
3 False
4 False
5 False
dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With