I have a python script that, for various reasons, has a variable that is a fairly large string, say 10mb long. This string contains multiple lines.
What is the fastest way to remove the first and last lines of this string? Due to the size of the string, the faster the operation, the better; there is an emphasis on speed. The program returns a slightly smaller string, sans the first and last lines.
'\n'.join(string_variable[-1].split('\n')[1:-1])
is the easiest way to do this, but it's extremely slow because the split() function copies the object in memory, and the join() copies it again.
Example string:
*** START OF DATA ***
data
data
data
*** END OF DATA ***
Extra credit: Have this program not choke if there is no data in between; this is optional, since for my case there shouldn't be a string with no data in between.
First split at '\n'
once and then check if the string at last index contains '\n'
, if yes str.rsplit
at '\n'
once and pick the item at 0th index otherwise return an empty string:
def solve(s):
s = s.split('\n', 1)[-1]
if s.find('\n') == -1:
return ''
return s.rsplit('\n', 1)[0]
...
>>> s = '''*** START OF DATA ***
data
data
data
*** END OF DATA ***'''
>>> solve(s)
'data\ndata\ndata'
>>> s = '''*** START OF DATA ***
*** END OF DATA ***'''
>>> solve(s)
''
>>> s = '\n'.join(['a'*100]*10**5)
>>> %timeit solve(s)
100 loops, best of 3: 4.49 ms per loop
Or don't split at all, find the index of '\n'
from either end and slice the string:
>>> def solve_fast(s):
ind1 = s.find('\n')
ind2 = s.rfind('\n')
return s[ind1+1:ind2]
...
>>> s = '''*** START OF DATA ***
data
data
data
*** END OF DATA ***'''
>>> solve_fast(s)
'data\ndata\ndata'
>>> s = '''*** START OF DATA ***
*** END OF DATA ***'''
>>> solve_fast(s)
''
>>> s = '\n'.join(['a'*100]*10**5)
>>> %timeit solve_fast(s)
100 loops, best of 3: 2.65 ms per loop
Consider a string s that is something like this:
s = "line1\nline2\nline3\nline4\nline5"
The following code...
s[s.find('\n')+1:s.rfind('\n')]
...produces the output:
'line2\nline3\nline4'
And, thus, is the shortest code to remove the first and the last line of a string. I do not think that the .find and .rfind methods do anything but search for a given string. Try out the speed!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With