First time posting so apologies in advance if my formatting is off.
Here's my issue:
I've created a Pandas dataframe which contains multiple rows of text:
d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
In [7]: keywords
Out[7]:
        keywords
0  cheap shoes
1  luxury shoes
2  cheap hiking shoes
Now I have a dictionary that contains the following keys / values:
labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}
What I would like to do is find out whether a key in the dictionary exist in the dataframe, and if so, return the appropriate value
I was able to somewhat get there using the following:
for k,v in labels.items():
   keywords['Labels'] = np.where(keywords['keywords'].str.contains(k),v,'No Match')
However, the output is missing the first two keys and is only catching the last "hiking" key
    keywords            Labels
0   cheap shoes         No Match
1   luxury shoes        No Match
2   cheap hiking shoes  sport
Additionally, I'd also like to know if there's a way to catch multiple values in the dictionary separated by | , so the ideal output would look like this
    keywords            Labels
0   cheap shoes         budget
1   luxury shoes        expensive
2   cheap hiking shoes  budget | sport
Any help or guidance is much appreciated.
Cheers
It's certainly possible. Here is one way.
d = {'keywords': ['cheap shoes', 'luxury shoes', 'cheap hiking shoes', 'nothing']}
keywords = pd.DataFrame(d,columns=['keywords'])
labels = {'cheap': 'budget', 'luxury': 'expensive', 'hiking': 'sport'}
df = pd.DataFrame(d)
def matcher(k):
    x = (i for i in labels if i in k)
    return ' | '.join(map(labels.get, x))
df['values'] = df['keywords'].map(matcher)
#              keywords          values
# 0         cheap shoes          budget
# 1        luxury shoes       expensive
# 2  cheap hiking shoes  budget | sport
# 3             nothing                
                        You can use "|".join(labels.keys()) to get a pattern to be used by re.findall().
import pandas as pd
import re
d = {'keywords' :['cheap shoes', 'luxury shoes', 'cheap hiking shoes']}
keywords = pd.DataFrame(d,columns=['keywords'])
labels = {'cheap' : 'budget', 'luxury' : 'expensive', 'hiking' : 'sport'}
pattern = "|".join(labels.keys())
def f(s):
    return "|".join(labels[word] for word in re.findall(pattern, s))
keywords.keywords.map(f)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With