Python 3.6
I'd like to remove a list of strings from a string. Here is my first poor attempt:
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = list(filter(lambda x: x not in items_to_remove, string.split(' ')))
print(result)
output:
['test']
But this doesn't work if x
isn't nicely spaced. I feel there must be a builtin solution, hmm There must be a better way!
I've had a look at this discussion on stack overflow, exact question as mine...
Not to waste my efforts. I timed all the solutions. I believe the easiest, fastest and most pythonic is the simple for loop. Which was not the conclusion in the other post...
result = string
for i in items_to_remove:
result = result.replace(i,'')
Test Code:
import timeit
t1 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = list(filter(lambda x: x not in items_to_remove, string.split(' ')))
''', number=1000000)
print(t1)
t2 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
def sub(m):
return '' if m.group() in items_to_remove else m.group()
result = re.sub(r'\w+', sub, string)
''',setup= 'import re', number=1000000)
print(t2)
t3 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = re.sub(r'|'.join(items_to_remove), '', string)
''',setup= 'import re', number=1000000)
print(t3)
t4 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = string
for i in items_to_remove:
result = result.replace(i,'')
''', number=1000000)
print(t4)
outputs:
1.9832003884248448
4.408749988641971
2.124719851741177
1.085117268194475
To remove multiple characters from a string we can easily use the function str. replace and pass a parameter multiple characters. The String class (Str) provides a method to replace(old_str, new_str) to replace the sub-strings in a string. It replaces all the elements of the old sub-string with the new sub-string.
In Python you can use the replace() and translate() methods to specify which characters you want to remove from the string and return a new modified string result. It is important to remember that the original string will not be altered because strings are immutable.
replace() to remove multiple characters from a string. Create a copy of the original string. Put the multiple characters that will be removed in one string. Use a for-loop to iterate through each character of the previous result.
You can use string.split()
if you aren't confident of your string spacing.
string.split()
and string.split(' ')
work a little differently:
In [128]: 'this is a test'.split()
Out[128]: ['this', 'is', 'a', 'test']
In [129]: 'this is a test'.split(' ')
Out[129]: ['this', '', '', '', '', 'is', '', '', 'a', 'test']
The former splits your string without any redundant empty strings.
If you want a little more security, or if your strings could contain tabs and newlines, there's another solution with regex:
In [131]: re.split('[\s]+', 'this is \t a\ntest', re.M)
Out[131]: ['this', 'is', 'a', 'test']
Lastly, I would suggest converting your lookup list into a lookup set
for efficient lookup in your filter:
In [135]: list(filter(lambda x: x not in {'is', 'this', 'a', 'string'}, string.split()))
Out[135]: ['test']
While on the topic of performance, a list comp is a bit faster than a filter, although less concise:
In [136]: [x for x in string.split() if x not in {'is', 'this', 'a', 'string'}]
Out[136]: ['test']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With