Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding rows in a Pandas DataFrame with columns that violate a one-to-one mapping

Tags:

python

pandas

I have a DataFrame kinda like this:

| index | col_1 | col_2 |
| 0     | A     | 11    |
| 1     | B     | 12    |
| 2     | B     | 12    |
| 3     | C     | 13    |
| 4     | C     | 13    |
| 5     | C     | 14    |

where col_1 and col_2 may not always be one-to-one due to corrupt data.

How can I use Pandas to determine which rows have col_1 and col_2 entries that violate this one-to-one relationship?

In this case it would be the last three rows since C can either map to 13 or 14.

like image 348
Roger Avatar asked Jun 02 '14 23:06

Roger


People also ask

How do I select specific rows and columns from a DataFrame?

To select a single value from the DataFrame, you can do the following. You can use slicing to select a particular column. To select rows and columns simultaneously, you need to understand the use of comma in the square brackets.


1 Answers

You could use a transform, counting the length of unique objects in each group. First look at the subset of just these columns, and then groupby a single column:

In [11]: g = df[['col1', 'col2']].groupby('col1')

In [12]: counts = g.transform(lambda x: len(x.unique()))

In [13]: counts
Out[13]:
  col2
0    1
1    1
2    1
3    2
4    2
5    2

The columns for the remaining columns (if not all)

In [14]: (counts == 1).all(axis=1)
Out[14]:
0     True
1     True
2     True
3    False
4    False
5    False
dtype: bool
like image 125
Andy Hayden Avatar answered Oct 23 '22 12:10

Andy Hayden