I am trying to compare one list of strings for similarity and get the results in a pandas dataframe for inspection; so I use one list as index and the other as column list. I then want to compute the "Levenshtein similarity" on them (a function that compares the similarity between two words).
I am trying to do that using applymap on every cell, and compare the cell index to the cell column. How could I do that? Or simpler alternatives?
things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index = things, columns = action)
def lev(x):
    x = Levenshtein.distance(x.index, x.column)  
matrix.applymap(lev)
so far I resorted to use the following (below) but I find it clumsy and slow
matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action)
for i, values in matrix.iterrows():
    for j, value in enumerate(values):
        matrix.ix[i,j] = Levenshtein.distance(i, value)
                apply() is used to apply a function along an axis of the DataFrame or on values of Series. applymap() is used to apply a function to a DataFrame elementwise. map() is used to substitute each value in a Series with another value.
The applymap() function is used to apply a function to a Dataframe elementwise. This method applies a function that accepts and returns a scalar to every element of a DataFrame. Python function, returns a single value from a single value.
To get the number of rows, and columns we can use len(df. axes[]) function in Python.
You can do that by "nested apply" as follows:
things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index=things, columns=action)
matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1))
Output:
          walking  caring  biking  eating
car             6       3       6       5
bike            6       5       3       5
sidewalk        7       8       7       8
eatery          6       5       6       3
The call pd.DataFrame(x) here is because x is a Series object and the Series.apply is similar to applymap, which does not carry index or columns information.
I think you can use apply on the dataframe, and to access columns' values use .name:
def lev(x):
    #replace your function
    return x.index + x.name
a = matrix.apply(lev)
print (a)
                  walking          caring          biking          eating
car            carwalking       carcaring       carbiking       careating
bike          bikewalking      bikecaring      bikebiking      bikeeating
sidewalk  sidewalkwalking  sidewalkcaring  sidewalkbiking  sidewalkeating
eatery      eaterywalking    eaterycaring    eaterybiking    eateryeating
EDIT:
If need some arithemtic operation use broadcasting:
a = pd.DataFrame(matrix.index.values + matrix.columns.values[:,None], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating
Or:
a = pd.DataFrame(matrix.index.values + matrix.columns.values[:, np.newaxis], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With