Run a function for each element in two lists in Pandas Dataframe Columns

Tags:

Attempt 2 Changing the get_top_matches function to say for val in value_list.split(): resulted in this below - which grabs the first word and compares it to the first word in each sublist in col2 5 times (not sure why 5 times):

[
  [0    [(myalyk, 0.73)]1    [(myalyk, 0.73)]2    [(myalyk, 0.73)]3    [(myalyk, 0.73)]4    [(myalyk, 0.73)]dtype: object]
, [0    [(myliu, 0.79)]1    [(myliu, 0.79)]2    [(myliu, 0.79)]3    [(myliu, 0.79)]4    [(myliu, 0.79)]dtype: object]
, [0    [(myllc, 0.97)]1    [(myllc, 0.97)]2    [(myllc, 0.97)]3    [(myllc, 0.97)]4    [(myllc, 0.97)]dtype: object]
, [0    [(myloc, 0.88)]1    [(myloc, 0.88)]2    [(myloc, 0.88)]3    [(myloc, 0.88)]4    [(myloc, 0.88)]dtype: object]
]

Just need the function to run on each word in the sublists.

Attempt 3 Removing the second attempt code from the get_top_matches function and modifying the attempt one list comprehension code to below, grabbed the first word in the first 3 sublists in col2; need to compare against the col1 list to each word in the col2 sublists:

[[df.agg(lambda x: get_top_matches(u,v), axis=1) for u in x ]
    for v in zip(*y)]
        for x,y in zip(df['col1'], df['col2'])
]

results to attempt 3

[[0    [(myllc, 0.97), (myloc, 0.88), (myliu, 0.79), 
...1    [(myllc, 0.97), (myloc, 0.88), (myliu, 0.79), 
...2    [(myllc, 0.97), (myloc, 0.88), (myliu, 0.79), 
...3    [(myllc, 0.97), (myloc, 0.88), (myliu, 0.79), 
...4    [(myllc, 0.97), (myloc, 0.88), (myliu, 0.79), 
...dtype: object]]

Expectation (this example: row 1 has 4 sublists, row 2 has 2 sublists. the function runs on each word in each column 1 for each word in each sublist in column 2 and puts the results in a sublist in a new column.)

[[['myalyk',.97], ['oleksandr',.54], ['nychyporovych',.3], ['pp',0]], [['myliu',.88], ['srl',.43]], [['myllc',1.0]], [['myloc',1.0], ['manag',.45], ['IT',.1], ['ag',0]]], 
[[['ltd',.34], ['yuriapharm',.76]], [['yuriypra',.65], ['law',.54], ['offic',.45], ['pc',.34]]],
...

352

asked Sep 10 '20 19:09

max

1 Answers

This works:

# Generate DataFrame
df = pd.DataFrame (data, columns = ['col1','col2'])

# Clean Data (strip out trailing commas on some words)
df['col1'] = df['col1'].map(lambda lst: [x.rstrip(',') for x in lst])

# 1. List comprehension Technique
# zip provides pairs of col1, col2 rows
result = [[get_top_matches(u, [v]) for u in x for w in y for v in w] for x, y in zip(df['col1'], df['col2'])]

# 2. DataFrame Apply Technique
def func(x, y):
return [get_top_matches(u, [v]) for u in x for w in y for v in w] 

df['func_scores'] = df.apply(lambda row: func(row['col1'], row['col2']), axis = 1)

# Verify two methods are equal
print(df['func_scores'].equals(pd.Series(result)))  # True

print(df['func_scores'].to_string(index=False))

Thanks all who helped

answered Sep 28 '22 01:09

max

Related questions
                            
                                How to make a python context manager catch a SIGINT or SIGTERM signal
                            
                                group by pandas dataframe and select maximun value within sequence
                            
                                How to stop bazel from relying on Python2
                            
                                Symlink (auto-generated) directories via Snakemake
                            
                                Best way to detect if checkbox is ticked
                            
                                Transforming multilabels to single label problem
                            
                                GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
                            
                                How can I get results for each utterances from google speech api and save each audio utterance chunk seperately as wav file?
                            
                                Dask: How to Add Security (TLS/SSL) to Dask Cluster?
                            
                                How do I know which versions of dependencies my application supports?
                            
                                How to share database connection between workers using FastAPI + uvicorn?
                            
                                How to inject pygame events from pytest?
                            
                                Plotly: How to update plotly data using dropdown list for line graph?
                            
                                Access to fetch `url` been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. ReactJS
                            
                                Finding the top k matches in Pytorch
                            
                                Redirecting stdout in Python into PyGame
                            
                                Add external margins with constrained layout?
                            
                                How do you wrap a C function that returns a pointer to a malloc'd array with ctypes?
                            
                                Converting pygame 2d water ripple to pyOpenGL
                            
                                How to webscrape leaflet maps polygons using Selenium and Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Run a function for each element in two lists in Pandas Dataframe Columns

Tags:

python

pandas

max

People also ask

1 Answers

max

Recent Activity

Donate For Us