Input :
"The boy is running on the train"
Output expected:
["The boy", "boy is", "is running", "running on", "on the", "the train"]
What is the easiest solution to achieve this in python.
line="The boy is running on the train"
words=line.split()
k=[words[index]+' '+words[index+1] for index in xrange(len(words)-1)]
print k
Output
['The boy', 'boy is', 'is running', 'running on', 'on the', 'the train']
You split on all spaces, then rejoin the pairs:
words = inputstr.split()
secondwords = iter(words)
next(secondwords)
output = [' '.join((first, second))
for first, second in zip(words, secondwords)]
Demo:
>>> inputstr = "The boy is running on the train"
>>> words = inputstr.split()
>>> secondwords = iter(words)
>>> next(secondwords) # output is ignored
'The'
>>> [' '.join((first, second)) for first, second in zip(words, secondwords)]
['The boy', 'boy is', 'is running', 'running on', 'on the', 'the train']
import re
s = "The boy is running on the train"
print map(' '.join,re.findall('([^ \t]+)[ \t]+(?=([^ \t]+))',s))
Koustav Ghosal's solution is the fastest:
import re
from time import clock
from itertools import izip
from collections import defaultdict
s = "The boy is running on the train"
z = 200
p = '%-9.6f %6.1f%% %s'
rgx = re.compile('([^ \t]+)[ \t]+(?=([^ \t]+))')
R = defaultdict(list)
for rep in xrange(3000):
t0 = clock()
for i in xrange(z):
map(' '.join,re.findall('([^ \t]+)[ \t]+(?=([^ \t]+))',s))
te1 = clock()-t0
R['e1'].append(te1)
t0 = clock()
for i in xrange(z):
map(' '.join,rgx.findall(s))
te2 = clock()-t0
R['e2'].append(te2)
t0 = clock()
for i in xrange(z):
words = s.split()
secondwords = iter(words)
next(secondwords)
[' '.join((first, second))
for first, second in zip(words, secondwords)]
tM1 = clock()-t0
R['M1'].append(tM1)
t0 = clock()
for i in xrange(z):
words = s.split()
secondwords = iter(words)
next(secondwords)
[' '.join((first, second))
for first, second in izip(words, secondwords)]
tM2 = clock()-t0
R['M2'].append(tM2)
t0 = clock()
for i in xrange(z):
words = s.split()
secondwords = iter(words)
next(secondwords)
[' '.join(x)
for x in izip(words, secondwords)]
tM3 = clock()-t0
R['M3'].append(tM3)
t0 = clock()
for i in xrange(z):
words=s.split()
[words[c]+' '+words[c+1] for c in range(len(words)-1)]
tK1 = clock() - t0
R['K1'].append(tK1)
t0 = clock()
for i in xrange(z):
words=s.split()
[words[c]+' '+words[c+1] for c in xrange(len(words)-1)]
tK2 = clock() - t0
R['K2'].append(tK2)
tmax = min(R['e1'])
for k,s in (('e1','eyquem with re.findall(pat,string)'),
('e2','eyquem with compiled_regex.findall(string)'),
('M1','Martijn Pieters'),
('M2','Martijn Pieters with izip'),
('M3','Martijn Pieters with izip and direct join'),
('K1','Koustav Ghosal'),
('K2','Koustav Ghosal with xrange')):
t = min(R[k])
print p % (t,t/tmax*100,s)
result with Python 2.7
0.007127 100.0% eyquem with re.findall(pat,string)
0.004045 56.8% eyquem with compiled_regex.findall(string)
0.003887 54.5% Martijn Pieters
0.002522 35.4% Martijn Pieters with izip
0.002152 30.2% Martijn Pieters with izip and direct join
0.002030 28.5% Koustav Ghosal
0.001856 26.0% Koustav Ghosal with xrange
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With