I want to compare a data frame of one column with another data frame of multiple columns and return the header of the column having maximum match percentage.
I am not able to find any match functions in pandas. First data frame first column :
cars
----
swift
maruti
wagonor
hyundai
jeep
First data frame second column :
bikes
-----
RE
Ninja
Bajaj
pulsar
one column data frame :
words
---------
swift
RE
maruti
waganor
hyundai
jeep
bajaj
Desired output :
100% match header - cars
Try to use isin function of pandas DataFrame. Assuming df is your first dataframe and words is a list :
In[1]: (df.isin(words).sum()/df.shape[0])*100
Out[1]:
cars 100.0
bikes 20.0
dtype: float64
You may need to lowercase strings in your df and in the words list to avoid any casing issue.
You can first get the columns into lists:
dfCarsList = df['cars'].tolist()
dfWordsList = df['words'].tolist()
dfBikesList = df['Bikes'].tolist()
And then iterate of the list for comparision:
numberCars = sum(any(m in L for m in dfCarsList) for L in dfWordsList)
numberBikes = sum(any(m in L for m in dfBikesList) for L in dfWordsList)
The higher number you can use than for your output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With