I have a dataframe with different values in column x
. I want to drop values that appear only once in a column.
So this:
x
1 10
2 30
3 30
4 40
5 40
6 50
Should turn into this:
x
2 30
3 30
4 40
5 40
I was wondering if there is a way to do that.
Pandas provide data analysts a way to delete and filter data frame using dataframe.drop () method. We can use this method to drop such rows that do not satisfy the given conditions. Let’s create a Pandas dataframe. Example 1 : Delete rows based on condition on a column. Example 2 : Delete rows based on multiple conditions on a column.
This method is a simple, but messy way to handle missing values since in addition to removing these values, it can potentially remove data that aren’t null. You can call dropna () on your entire dataframe or on specific columns: # Drop rows with null values df = df.dropna (axis=0) # Drop column_1 rows with null values
This is a bit less resource intensive than a COUNTIF down 250K records, and because of the sort would flag every Name that appears more than once with a 1. Copy/paste as values and sort by that column and you can just delete those, leaving the entries that only appear once.
Knowing this, you can be more informed on what to do with null values such as: This method is a simple, but messy way to handle missing values since in addition to removing these values, it can potentially remove data that aren’t null. You can call dropna () on your entire dataframe or on specific columns:
You can easily get this by using groupby
and transform
:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([10, 30, 30, 40, 40, 50], columns=['x'])
In [3]: df = df[df.groupby('x').x.transform(len) > 1]
In [4]: df
Out[4]:
x
1 30
2 30
3 40
4 40
You can use groupby
and then filter
it:
In [9]:
df = pd.DataFrame([10, 30, 30, 40, 40, 50], columns=['x'])
df = df.groupby('x').filter(lambda x: len(x) > 1)
df
Out[9]:
x
1 30
2 30
3 40
4 40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With