I need to make a list of all 𝑛 -grams beginning at the head of string for each integer 𝑛 from 1 to M. Then return a tuple of M such lists.
def letter_n_gram_tuple(s, M):
s = list(s)
output = []
for i in range(0, M+1):
output.append(s[i:])
return(tuple(output))
From letter_n_gram_tuple("abcd", 3)
output should be:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd']))
However, my output is:
(['a', 'b', 'c', 'd'], ['b', 'c', 'd'], ['c', 'd'], ['d']).
Should I use string slicing and then saving slices into the list?
you can use nested for, first for about n-gram, second to slice the string
def letter_n_gram_tuple(s, M):
output = []
for i in range(1, M + 1):
gram = []
for j in range(0, len(s)-i+1):
gram.append(s[j:j+i])
output.append(gram)
return tuple(output)
or just one line by list comprehension:
output = [[s[j:j+i] for j in range(0, len(s)-i+1)] for i in range(1, M + 1)]
or use windowed
in more_itertools
:
import more_itertools
output = [list(more_itertools.windowed(s, i)) for i in range(1, M + 1)]
test and output:
print(letter_n_gram_tuple("abcd", 3))
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])
You need one more for
loop to iterate over letters or str
:
def letter_n_gram_tuple(s, M):
output = []
for i in range(0, M):
vals = [s[j:j+i+1] for j in range(len(s)) if len(s[j:j+i+1]) == i+1]
output.append(vals)
return tuple(output)
print(letter_n_gram_tuple("abcd", 3))
Output:
(['a', 'b', 'c', 'd'], ['ab', 'bc', 'cd'], ['abc', 'bcd'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With