I am trying to compare one list of strings for similarity and get the results in a pandas dataframe for inspection; so I use one list as index and the other as column list. I then want to compute the "Levenshtein similarity" on them (a function that compares the similarity between two words).
I am trying to do that using applymap
on every cell, and compare the cell index to the cell column. How could I do that? Or simpler alternatives?
things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index = things, columns = action)
def lev(x):
x = Levenshtein.distance(x.index, x.column)
matrix.applymap(lev)
so far I resorted to use the following (below) but I find it clumsy and slow
matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action)
for i, values in matrix.iterrows():
for j, value in enumerate(values):
matrix.ix[i,j] = Levenshtein.distance(i, value)
apply() is used to apply a function along an axis of the DataFrame or on values of Series. applymap() is used to apply a function to a DataFrame elementwise. map() is used to substitute each value in a Series with another value.
The applymap() function is used to apply a function to a Dataframe elementwise. This method applies a function that accepts and returns a scalar to every element of a DataFrame. Python function, returns a single value from a single value.
To get the number of rows, and columns we can use len(df. axes[]) function in Python.
You can do that by "nested apply
" as follows:
things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index=things, columns=action)
matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1))
Output:
walking caring biking eating
car 6 3 6 5
bike 6 5 3 5
sidewalk 7 8 7 8
eatery 6 5 6 3
The call pd.DataFrame(x)
here is because x
is a Series
object and the Series.apply
is similar to applymap
, which does not carry index
or columns
information.
I think you can use apply
on the dataframe, and to access columns' values use .name
:
def lev(x):
#replace your function
return x.index + x.name
a = matrix.apply(lev)
print (a)
walking caring biking eating
car carwalking carcaring carbiking careating
bike bikewalking bikecaring bikebiking bikeeating
sidewalk sidewalkwalking sidewalkcaring sidewalkbiking sidewalkeating
eatery eaterywalking eaterycaring eaterybiking eateryeating
EDIT:
If need some arithemtic operation use broadcasting:
a = pd.DataFrame(matrix.index.values + matrix.columns.values[:,None],
index=matrix.index,
columns=matrix.columns)
print (a)
walking caring biking eating
car carwalking bikewalking sidewalkwalking eaterywalking
bike carcaring bikecaring sidewalkcaring eaterycaring
sidewalk carbiking bikebiking sidewalkbiking eaterybiking
eatery careating bikeeating sidewalkeating eateryeating
Or:
a = pd.DataFrame(matrix.index.values + matrix.columns.values[:, np.newaxis],
index=matrix.index,
columns=matrix.columns)
print (a)
walking caring biking eating
car carwalking bikewalking sidewalkwalking eaterywalking
bike carcaring bikecaring sidewalkcaring eaterycaring
sidewalk carbiking bikebiking sidewalkbiking eaterybiking
eatery careating bikeeating sidewalkeating eateryeating
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With