Given that I have a string like:
'velvet evening purse bags'
how can I get all word pairs of this? In other words, all 2-word combinations of this:
'velvet evening'
'velvet purse'
'velvet bags'
'evening purse'
'evening bags'
'purse bags'
I know python's nltk
package can give the bigrams but I'm looking for something beyond that functionality. Or do I have to write my own custom function in Python?
You can use itertools.combinations
for this:
s = 'velvet evening purse bags'
from nltk import word_tokenize
words = word_tokenize(s)
from itertools import combinations
pairs = [' '.join(comb) for comb in combinations(words, 2)]
print(pairs)
Output:
['velvet evening', 'velvet purse', 'velvet bags', 'evening purse', 'evening bags', 'purse bags']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With