I am trying to compare one list of strings for similarity and get the results in a pandas dataframe for inspection; so I use one list as index and the other as column list. I then want to compute the "Levenshtein similarity" on them (a function that compares the similarity between two words). I am trying to do that using <code>applymap</code> on every cell, and compare the cell index to the cell column. How could I do that? Or simpler alternatives? <pre class="prettyprint"><code>things = ['car', 'bike', 'sidewalk', 'eatery'] action = ['walking', 'caring', 'biking', 'eating'] matrix = pd.DataFrame(index = things, columns = action) def lev(x): x = Levenshtein.distance(x.index, x.column) matrix.applymap(lev) </code></pre> so far I resorted to use the following (below) but I find it clumsy and slow <pre class="prettyprint"><code>matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action) for i, values in matrix.iterrows(): for j, value in enumerate(values): matrix.ix[i,j] = Levenshtein.distance(i, value) </code></pre>

You can do that by "nested <code>apply</code>" as follows: <pre class="prettyprint"><code>things = ['car', 'bike', 'sidewalk', 'eatery'] action = ['walking', 'caring', 'biking', 'eating'] matrix = pd.DataFrame(index=things, columns=action) matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1)) </code></pre> Output: <pre class="prettyprint"><code> walking caring biking eating car 6 3 6 5 bike 6 5 3 5 sidewalk 7 8 7 8 eatery 6 5 6 3 </code></pre> The call <code>pd.DataFrame(x)</code> here is because <code>x</code> is a <code>Series</code> object and the <code>Series.apply</code> is similar to <code>applymap</code>, which does not carry <code>index</code> or <code>columns</code> information.

Pandas - retrieve row and column name for each element during applymap

Tags:

python

pandas

I am trying to compare one list of strings for similarity and get the results in a pandas dataframe for inspection; so I use one list as index and the other as column list. I then want to compute the "Levenshtein similarity" on them (a function that compares the similarity between two words).

I am trying to do that using applymap on every cell, and compare the cell index to the cell column. How could I do that? Or simpler alternatives?

things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index = things, columns = action)

def lev(x):
    x = Levenshtein.distance(x.index, x.column)  
matrix.applymap(lev)

so far I resorted to use the following (below) but I find it clumsy and slow

matrix = pd.DataFrame(data = [action for i in things], index = things, columns = action)
for i, values in matrix.iterrows():
    for j, value in enumerate(values):
        matrix.ix[i,j] = Levenshtein.distance(i, value)

582

asked Apr 27 '17 10:04

jim jarnac

2 Answers

You can do that by "nested apply" as follows:

things = ['car', 'bike', 'sidewalk', 'eatery']
action = ['walking', 'caring', 'biking', 'eating']
matrix = pd.DataFrame(index=things, columns=action)
matrix.apply(lambda x: pd.DataFrame(x).apply(lambda y: LD(x.name, y.name), axis=1))

Output:

          walking  caring  biking  eating
car             6       3       6       5
bike            6       5       3       5
sidewalk        7       8       7       8
eatery          6       5       6       3

The call pd.DataFrame(x) here is because x is a Series object and the Series.apply is similar to applymap, which does not carry index or columns information.

answered Nov 07 '22 01:11

chaonan99

I think you can use apply on the dataframe, and to access columns' values use .name:

def lev(x):
    #replace your function
    return x.index + x.name
a = matrix.apply(lev)
print (a)
                  walking          caring          biking          eating
car            carwalking       carcaring       carbiking       careating
bike          bikewalking      bikecaring      bikebiking      bikeeating
sidewalk  sidewalkwalking  sidewalkcaring  sidewalkbiking  sidewalkeating
eatery      eaterywalking    eaterycaring    eaterybiking    eateryeating

EDIT:

If need some arithemtic operation use broadcasting:

a = pd.DataFrame(matrix.index.values + matrix.columns.values[:,None], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating

Or:

a = pd.DataFrame(matrix.index.values + matrix.columns.values[:, np.newaxis], 
                 index=matrix.index, 
                 columns=matrix.columns)
print (a)
             walking       caring           biking         eating
car       carwalking  bikewalking  sidewalkwalking  eaterywalking
bike       carcaring   bikecaring   sidewalkcaring   eaterycaring
sidewalk   carbiking   bikebiking   sidewalkbiking   eaterybiking
eatery     careating   bikeeating   sidewalkeating   eateryeating

answered Nov 07 '22 01:11

jezrael

Related questions
                            
                                PyCharm - can't use remote interpreter
                            
                                tflearn / tensorflow does not learn xor
                            
                                Can't install PIL
                            
                                PyCharm Cannot Run Program C:\\Anaconda\\python.exe
                            
                                AttributeError: 'Graph' object has no attribute 'cypher' in migration of data from Postgress to Neo4j(Graph Database)
                            
                                openpyxl: assign value or apply format to a range of Excel cells without iteration
                            
                                Download a file from a Flask-based Python server
                            
                                List index out of range with Panda read_csv
                            
                                Remove special characters in pandas dataframe
                            
                                How to read data in Python dataframe without concatenating?
                            
                                How to configure Visual Studio Code to debug Django app in a virtualenv?
                            
                                Reading a tarfile into BytesIO
                            
                                How to install bpython for Python 3?
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)
                            
                                capturing rtsp camera using OpenCV python
                            
                                Why do we need coroutines in python? [closed]
                            
                                Flask-restful - Custom error handling
                            
                                Python multiprocessing - Debugging OSError: [Errno 12] Cannot allocate memory
                            
                                How do I open Python IDLE (Shell WIndow) in WIndows 10?
                            
                                Write strings/text and pandas dataframe to excel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With