Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe apply doesn't accept axis argument

I have two dataframes: data and rules .

>>>data                            >>>rules
   vendor                             rule
0  googel                           0 google
1  google                           1 dell
2  googly                           2 macbook

I am trying to add two new columns into the data dataframe after computing the Levenshtein similarity between each vendor and rule. So my dataframe should ideally contain columns looking like this:

>>>data
  vendor   rule    similarity
0 googel   google    0.8

So far I am trying to perform an apply function that will return me this structure, but the dataframe apply is not accepting the axis argument.

>>> for index,r in rules.iterrows():
...     data[['rule','similarity']]=data['vendor'].apply(lambda row:[r[0],ratio(row[0],r[0])],axis=1)
...
Traceback (most recent call last):

File "<stdin>", line 2, in <module>

File "/home/mnnr/test/env/test-1.0/runtime/lib/python3.4/site-packages/pandas/core/series.py", line 2220, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/src/inference.pyx", line 1088, in pandas.lib.map_infer (pandas/lib.c:62658)
File "/home/mnnr/test/env/test-1.0/runtime/lib/python3.4/site-packages/pandas/core/series.py", line 2209, in <lambda>
f = lambda x: func(x, *args, **kwds)

TypeError: <lambda>() got an unexpected keyword argument 'axis'

Could someone please help me figure out what I am doing wrong? Any change I make is just creating new errors.Thank you

like image 383
sleepophile Avatar asked Aug 25 '17 09:08

sleepophile


1 Answers

You're calling the Series version of apply for which it doesn't make sense to have an axis arg hence the error.

If you did:

data[['rule','similarity']]=data[['vendor']].apply(lambda row:[r[0],ratio(row[0],r[0])],axis=1)

then this makes a single column df for which this would work

Or just remove the axis arg:

data[['rule','similarity']]=data['vendor'].apply(lambda row:[r[0],ratio(row[0],r[0])])

update

Looking at what you're doing, you need to calculate the levenshtein ratio for each rule against every vendor.

You can do this by:

data['vendor'].apply(lambda row: rules['rule'].apply(lambda x: ratio(x, row))

this I think should calculate the ratio for each vendor against every rule.

like image 90
EdChum Avatar answered Sep 22 '22 04:09

EdChum