I have a large dataframe df1 that looks like this:
DeviceID Location
1 Internal
1 External
2 Internal
2 Internal
3 Internal
3 External
3 Internal
4 Internal
4 Internal
5 External
5 Internal
I'm trying to find and select the rows where a single DeviceID is recorded with both "Internal" AND "External" values in the Location column.
The next step would be to drop these rows from the dataframe. The final dataframe df2 would look like so:
DeviceID Location
2 Internal
2 Internal
4 Internal
4 Internal
What I've attempted so far is:
indexDI = df[(df['Location'] == 'Internal') & df['Location'] == 'External') ].index
df.drop(indexDI, inplace = True)
but this seems to have to dropped all the rows with "Internal".
Any help would be appreciated :)
You can groupby, transform with the nunique to see which gorups contain two different values and use the result to perform boolean indexing on the dataframe:
df[df.groupby('DeviceID').Location.transform('nunique').eq(1)]
DeviceID Location
2 2 Internal
3 2 Internal
7 4 Internal
8 4 Internal
Simple add reset_index(drop=True) for a panda's RangeIndex
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With