Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove values that appear only once in a DataFrame column

I have a dataframe with different values in column x. I want to drop values that appear only once in a column.

So this:

   x
1 10
2 30
3 30
4 40
5 40
6 50

Should turn into this:

   x
2 30
3 30
4 40
5 40

I was wondering if there is a way to do that.

like image 910
Francisco García Avatar asked Oct 11 '15 23:10

Francisco García


People also ask

How to delete rows from a pandas Dataframe?

Pandas provide data analysts a way to delete and filter data frame using dataframe.drop () method. We can use this method to drop such rows that do not satisfy the given conditions. Let’s create a Pandas dataframe. Example 1 : Delete rows based on condition on a column. Example 2 : Delete rows based on multiple conditions on a column.

How do I remove missing values from a Dataframe in Python?

This method is a simple, but messy way to handle missing values since in addition to removing these values, it can potentially remove data that aren’t null. You can call dropna () on your entire dataframe or on specific columns: # Drop rows with null values df = df.dropna (axis=0) # Drop column_1 rows with null values

How to delete entries that only appear once in a column?

This is a bit less resource intensive than a COUNTIF down 250K records, and because of the sort would flag every Name that appears more than once with a 1. Copy/paste as values and sort by that column and you can just delete those, leaving the entries that only appear once.

What to do with NULL values in a Dataframe?

Knowing this, you can be more informed on what to do with null values such as: This method is a simple, but messy way to handle missing values since in addition to removing these values, it can potentially remove data that aren’t null. You can call dropna () on your entire dataframe or on specific columns:


2 Answers

You can easily get this by using groupby and transform :

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([10, 30, 30, 40, 40, 50], columns=['x'])

In [3]: df = df[df.groupby('x').x.transform(len) > 1]

In [4]: df
Out[4]: 
    x
1  30
2  30
3  40
4  40
like image 85
Dimitris Fasarakis Hilliard Avatar answered Oct 07 '22 00:10

Dimitris Fasarakis Hilliard


You can use groupby and then filter it:

In [9]:    
df = pd.DataFrame([10, 30, 30, 40, 40, 50], columns=['x'])
df = df.groupby('x').filter(lambda x: len(x) > 1)
df

Out[9]:
    x
1  30
2  30
3  40
4  40
like image 23
EdChum Avatar answered Oct 06 '22 22:10

EdChum