A dataframe (pandas) has two columns. It is required to remove those rows for which the entry in 1st column has no duplicates.
Example data:
1 A
1 B
2 A
3 D
2 C
4 E
4 E
Expected output
1 A
1 B
2 A
2 C
4 E
4 E
In other words, it is required to remove all single-occuring (implies unique) values from 1st column. What would be fastest way to achieve this in python (~50k rows)?
Ctrl + Shift + L in Excel 2013, or under the Data menu. Then click the filter drop-down in the new TRUE/FALSE column and uncheck "FALSE" to show only uniques and click OK. Then Select the visible rows and delete those rows (right-click any row -> delete row).
delete from TableName where CountryCode in (select CountryCode from TableName group by CountryCode having count(*) = 1). ... returns rows with unique CountryCodes . And then the delete statement: delete from TableName where CountryCode in (...)
Remove Unique Values in Google Sheets Click the filter icon for Column C (cell C1), uncheck 1, and click OK. 3. Select and right-click filtered rows, then click Delete selected rows.
One way is to use duplicated() method
df.duplicated('c1')
default flags all but first, and take_last=True
gives the others.
In [600]: df[df.duplicated('c1') | df.duplicated('c1', take_last=True)]
Out[600]:
c1 c2
0 1 A
1 1 B
2 2 A
4 2 C
5 4 E
6 4 E
Here's one way: Assume the dataframe is 'd' and the columns are named 'a' and 'b'. First, get the number of times each unique value in 'a' appears:
e = d['a'].value_counts()
Then get the list of values greater than 1, and return the rows whose first column is a member of that list:
d[d['a'].isin(e[e>1].index)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With