I need to split a string into a list of each two words, but repeating the last word of each pair of words. Here is what I tried, by using examples I found for other questions:
line = """Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."""
def split_line(in_line):
line_sp = line.split(" ")
line_two = [" ".join(line_sp[i:i + 2]) for i in range(0, len(line_sp), 2)]
return line_two
print(split_line(line))
This results into:
['Lorem ipsum', 'dolor sit', 'amet, consectetur', 'adipiscing elit,', 'sed do', 'eiusmod tempor', 'incididunt ut', 'labore et', 'dolore magna', 'aliqua.']
But what I actually need is this:
['Lorem ipsum', 'ipsum dolor', 'dolor sit', 'sit amet', 'amet, consectetur', 'consectetur adipiscing', ...]
How can I make it work? Thanks!
To convert a string in a list of words, you just need to split it on whitespace. You can use split() from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.
split() method accepts two arguments. The first optional argument is separator , which specifies what kind of separator to use for splitting the string. If this argument is not provided, the default value is any whitespace, meaning the string will split whenever .
Description. Python string method split() returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.
You can use zip
on the following two slices of words:
words = line.split()
print(list(map(' '.join, zip(words[:-1], words[1:]))))
This outputs:
['Lorem ipsum', 'ipsum dolor', 'dolor sit', 'sit amet,', 'amet, consectetur', 'consectetur adipiscing', 'adipiscing elit,', 'elit, sed', 'sed do', 'do eiusmod', 'eiusmod tempor', 'tempor incididunt', 'incididunt ut', 'ut labore', 'labore et', 'et dolore', 'dolore magna', 'magna aliqua.']
Simple for loop
l = line.split(' ')
result = []
for i in range(len(l) - 1):
result.append(l[i] + ' ' + l[i+1])
print(result)
# ['Lorem ipsum', 'ipsum dolor', 'dolor sit', 'sit amet,', 'amet, consectetur', 'consectetur adipiscing', 'adipiscing elit,', 'elit, sed', 'sed do', 'do eiusmod', 'eiusmod tempor', 'tempor incididunt', 'incididunt ut', 'ut labore', 'labore et', 'et dolore', 'dolore magna', 'magna aliqua.', 'Lorem ipsum', 'ipsum dolor', 'dolor sit', 'sit amet,', 'amet, consectetur', 'consectetur adipiscing', 'adipiscing elit,', 'elit, sed', 'sed do', 'do eiusmod', 'eiusmod tempor', 'tempor incididunt', 'incididunt ut', 'ut labore', 'labore et', 'et dolore', 'dolore magna', 'magna aliqua.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With