Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all pairs of right-branching words from a sentence

Given that I have a string like:

 'velvet evening purse bags'

how can I get all word pairs of this? In other words, all 2-word combinations of this:

'velvet evening'
'velvet purse'
'velvet bags'
'evening purse'
'evening bags'
'purse bags'

I know python's nltk package can give the bigrams but I'm looking for something beyond that functionality. Or do I have to write my own custom function in Python?

like image 507
bryan.blackbee Avatar asked Jan 25 '23 22:01

bryan.blackbee


1 Answers

You can use itertools.combinations for this:

s = 'velvet evening purse bags'

from nltk import word_tokenize

words = word_tokenize(s)

from itertools import combinations

pairs = [' '.join(comb) for comb in combinations(words, 2)]

print(pairs)

Output:

['velvet evening', 'velvet purse', 'velvet bags', 'evening purse', 'evening bags', 'purse bags']
like image 177
MrGeek Avatar answered Jan 28 '23 14:01

MrGeek