If I have a list of letters, such as:word = ['W','I','N','E']
and need to get every possible sequence of substrings, of length 3 or less, e.g.:W I N E, WI N E, WI NE, W IN E, WIN E
etc.
What is the most efficient way to go about this?
Right now, I have:
word = ['W','I','N','E']
for idx,phon in enumerate(word):
phon_seq = ""
for p_len in range(3):
if idx-p_len >= 0:
phon_seq = " ".join(word[idx-(p_len):idx+1])
print(phon_seq)
This just gives me the below, rather than the sub-sequences:
W
I
W I
N
I N
W I N
E
N E
I N E
I just can't figure out how to create every possible sequence.
Try this recursive algorithm:
def segment(word):
def sub(w):
if len(w) == 0:
yield []
for i in xrange(1, min(4, len(w) + 1)):
for s in sub(w[i:]):
yield [''.join(w[:i])] + s
return list(sub(word))
# And if you want a list of strings:
def str_segment(word):
return [' '.join(w) for w in segment(word)]
Output:
>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]
>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']
As there can either be a space or not in each of three positions (after W, after I and after N), you can think of this as similar to bits being 1 or 0 in a binary representation of a number ranging from 1 to 2^3 - 1.
input_word = "WINE"
for variation_number in xrange(1, 2 ** (len(input_word) - 1)):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
Edit: To include only variations with sequences of 3 characters or less (in the general case where input_word
may be longer than 4 characters), we can exclude cases where the binary representation contains 3 zeroes in a row. (We also start the range from a higher number in order to exclude the cases which would have 000 at the beginning.)
for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)):
if not '000' in bin(variation_number):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
My implementation for this problem.
#!/usr/bin/env python
# this is a problem of fitting partitions in the word
# we'll use itertools to generate these partitions
import itertools
word = 'WINE'
# this loop generates all possible partitions COUNTS (up to word length)
for partitions_count in range(1, len(word)+1):
# this loop generates all possible combinations based on count
for partitions in itertools.combinations(range(1, len(word)), r=partitions_count):
# because of the way python splits words, we only care about the
# difference *between* partitions, and not their distance from the
# word's beginning
diffs = list(partitions)
for i in xrange(len(partitions)-1):
diffs[i+1] -= partitions[i]
# first, the whole word is up for taking by partitions
splits = [word]
# partition the word's remainder (what was not already "taken")
# with each partition
for p in diffs:
remainder = splits.pop()
splits.append(remainder[:p])
splits.append(remainder[p:])
# print the result
print splits
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With