I am new to programming.I have a pandas data frame in which two string columns are present.
Data frame is like below:
Col-1 Col-2
Update have a account
Account account summary
AccountDTH Cancel
Balance Balance Summary
Credit Card Update credit card
Here i need to check the similarity of Col-2 elements with each element of Col-1.
It Means i have to compare have a account
with all the elements of Col-1
.
Then find the top 3 similar one. Suppose the similarity scores are :Account(85),AccountDTH(80),Balance(60),Update(45),Credit Card(35)
.
Expected Output is:
Col-2 Output
have a account Account(85),AccountDTH(80),Balance(60)
To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.
DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
You can use a Python library like fuzzywuzzy
here, which has support for this type of task:
from fuzzywuzzy import process
df.assign(Output=[process.extract(i, df['Col-1'], limit=3) for i in df['Col-2']])
Using the process
method, we can get string similary scores, and then pick the top 3, if 3 exist:
The output of the above code:
Col-1 Col-2 Output
0 Update have a account [(Account, 90, 1), (AccountDTH, 64, 2), (Update, 40, 0)]
1 Account account summary [(Account, 90, 1), (AccountDTH, 63, 2), (Credit Card, 38, 4)]
2 AccountDTH Cancel [(Balance, 62, 3), (Credit Card, 43, 4), (Update, 33, 0)]
3 Balance Balance Summary [(Balance, 90, 3), (Credit Card, 38, 4), (Update, 30, 0)]
4 Credit Card Update credit card [(Update, 90, 0), (Credit Card, 90, 4), (AccountDTH, 27, 2)]
To speed this comparison up (natively it uses Python's sequence matcher), I would recommend installing python-Levenshtein
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With