Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create new column in dataframe using fuzzywuzzy

I have a dataframe in pandas where I am using fuzzywuzzy package in python to match first column in the dataframe with second column.

I have defined a function to create an output with first column, second column and partial ratio score. But it is not working.

Could you please help

import csv
import sys
import os
import numpy as np
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

def match(driver):
    driver["score"]=driver.apply(lambda row: fuzz.partial_ratio(row driver[driver.columns[0]], driver[driver.columns[1]]), axis=1)
    print(driver)
    return(driver)

Regards

-Abacus

like image 550
Abacus Avatar asked Mar 21 '16 18:03

Abacus


1 Answers

You're passed a Series to work with inside the apply function, representing the current row here. In your code, you're effectively ignoring this Series and trying to call partial_ratio with the two whole columns of the DataFrame each time (driver[col]).

A minor change to your code should hopefully give you what you want.

d = DataFrame({'one': ['fuzz', 'wuzz'], 'two': ['fizz', 'woo']})

d.apply(lambda s: fuzz.partial_ratio(s['one'], s['two']), axis=1)

0    75
1    33
dtype: int64

(Interestingly, the partial_ratio function will accept a Series as input, but only because it converts it internally into a string. :)

like image 154
meloncholy Avatar answered Nov 14 '22 18:11

meloncholy