Multiple sets of duplicate records from a pandas dataframe

Tags:

How to get all the existing duplicated sets of records(based on a column) from a dataframe?

I got a dataframe as follows:

flight_id | from_location  | to_location |  schedule |  
1         |   Vancouver    |   Toronto   |   3-Jan   |  
2         |   Amsterdam    |   Tokyo     |   15-Feb  |  
4         |   Fairbanks    |   Glasgow   |   12-Jan  |  
9         |   Halmstad     |   Athens    |   21-Jan  |  
3         |   Brisbane     |   Lisbon    |   4-Feb   |  
4         | Johannesburg   |   Venice    |   23-Jan  |
9         | LosAngeles     |  Perth      |   3-Mar   |

Here flight_id is the column on which I need to check duplicates. And there are 2 sets of duplicates.

Output for this specific example should look like--[(2,5),(3,6)]. List of tuples of record index values

218

asked Mar 23 '18 19:03

Kingz

1 Answers

Is this what you need ? duplicated+groupby

(df.loc[df['flight_id'].duplicated(keep=False)].reset_index()).groupby('flight_id')['index'].apply(tuple)
Out[510]: 
flight_id
4    (2, 5)
9    (3, 6)
Name: index, dtype: object

Adding tolist at the end

(df.loc[df['flight_id'].duplicated(keep=False)].reset_index()).groupby('flight_id')['index'].apply(tuple).tolist()
Out[511]: [(2, 5), (3, 6)]

And another solution ... for fun only

s=df['flight_id'].value_counts()
list(map(lambda x : tuple(df[df['flight_id']==x].index.tolist()), s[s.gt(1)].index))
Out[519]: [(2, 5), (3, 6)]

150

answered Oct 22 '22 13:10

BENY

Related questions
                            
                                python: extracting variables from string templates
                            
                                Seaborn Boxplot: get the xtick labels
                            
                                Using networkx to calculate eigenvector centrality
                            
                                Apply textblob in for each row of a dataframe
                            
                                Destroying a Singleton object in Python
                            
                                understanding matplotlib.subplots python [duplicate]
                            
                                Pandas DataFrame mutability
                            
                                How to do zero padding in keras conv layer?
                            
                                python installing package with submodules
                            
                                OSMNx : get coordinates of nodes using OSM id
                            
                                Finding equal values from a list of list of tuples in Python
                            
                                Matplotlib savefig() over multiple graphs keeps saving the same graph
                            
                                prefetch_related for Authenticated user
                            
                                Django: Read uploaded CSV file using FileField instance
                            
                                difference between str(dict) and json.dumps(dict)
                            
                                Creating a mixture of probability distributions for sampling
                            
                                keras bidirectional lstm seq2seq
                            
                                updated object's attribute in python class, but not getting reflected
                            
                                fit-transform on training data and transform on test data [duplicate]
                            
                                Using Apply in Pandas Lambda functions with multiple if statements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple sets of duplicate records from a pandas dataframe

Tags:

python

pandas

dataframe

group-by

pandas-groupby

Kingz

People also ask

1 Answers

BENY

Recent Activity

Donate For Us