A dataframe (pandas) has two columns. It is required to remove those rows for which the entry in 1st column has no duplicates. Example data: 1 A 1 B 2 A 3 D 2 C 4 E 4 E Expected output 1 A 1 B 2 A 2 C 4 E 4 E In other words, it is required to remove all single-occuring (implies unique) values from 1st column. What would be fastest way to achieve this in python (~50k rows)?

One way is to use duplicated() method <code>df.duplicated('c1')</code> default flags all but first, and <code>take_last=True</code> gives the others. <pre class="prettyprint"><code>In [600]: df[df.duplicated('c1') | df.duplicated('c1', take_last=True)] Out[600]: c1 c2 0 1 A 1 1 B 2 2 A 4 2 C 5 4 E 6 4 E </code></pre>

Here's one way: Assume the dataframe is 'd' and the columns are named 'a' and 'b'. First, get the number of times each unique value in 'a' appears: <pre class="prettyprint"><code>e = d['a'].value_counts() </code></pre> Then get the list of values greater than 1, and return the rows whose first column is a member of that list: <pre class="prettyprint"><code>d[d['a'].isin(e[e>1].index)] </code></pre>

Remove non-duplicated entries

2 Answers

One way is to use duplicated() method

df.duplicated('c1') default flags all but first, and take_last=True gives the others.

In [600]: df[df.duplicated('c1') | df.duplicated('c1', take_last=True)]
Out[600]:
   c1 c2
0   1  A
1   1  B
2   2  A
4   2  C
5   4  E
6   4  E

136

answered Oct 19 '22 23:10

Zero

Here's one way: Assume the dataframe is 'd' and the columns are named 'a' and 'b'. First, get the number of times each unique value in 'a' appears:

e = d['a'].value_counts()

Then get the list of values greater than 1, and return the rows whose first column is a member of that list:

d[d['a'].isin(e[e>1].index)]

answered Oct 19 '22 21:10

ehaymore

Related questions
                            
                                Pygame sceen.fill() not filling up the color properly
                            
                                What sort of Python array would this be? Does it already exist in Python?
                            
                                Python unittests, statement before test cases
                            
                                OpenCV: fit the detected edges
                            
                                Plot Red Channel from 3D Numpy Array
                            
                                ImportError: No module named 'version'
                            
                                Pandas dataframe Cartesian join
                            
                                Audio file to text file python
                            
                                Basic linear algebra on spark matrices
                            
                                NLTK: why does nltk not recognize the CLASSPATH variable for stanford-ner?
                            
                                Why matplotlib replace a right parenthesis with "!" in latex expression?
                            
                                Passing Python3 to virtualenvwrapper throws up ImportError
                            
                                Pandas: merge two dataframes ignoring NaN
                            
                                Python: Get all combinations of sequential elements of list
                            
                                Pass !, !=, ~, <, > as parameters
                            
                                Matplotlib scatter(): default value for size, marker shape
                            
                                How to apply parallel or asynchronous I/O file writing on a python piece of code
                            
                                Py2exe and selenium - IOError: [Errno 2] No such file or directory: '\\dist\\main.exe\\selenium\\webdriver\\firefox\\webdriver_prefs.json'
                            
                                Loading C# DLL with unmanaged exports into Python
                            
                                Automatic 'focus' of TextInput Kivy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove non-duplicated entries

Tags:

python

pandas

zen

People also ask

2 Answers

Zero

ehaymore

Recent Activity

Donate For Us