Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python best way to remove multiple strings from string

Tags:

python

string

Python 3.6

I'd like to remove a list of strings from a string. Here is my first poor attempt:

string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = list(filter(lambda x: x not in items_to_remove, string.split(' ')))
print(result)

output:

['test']

But this doesn't work if x isn't nicely spaced. I feel there must be a builtin solution, hmm There must be a better way!

I've had a look at this discussion on stack overflow, exact question as mine...

Not to waste my efforts. I timed all the solutions. I believe the easiest, fastest and most pythonic is the simple for loop. Which was not the conclusion in the other post...

result = string
for i in items_to_remove:
    result = result.replace(i,'')

Test Code:

import timeit

t1 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = list(filter(lambda x: x not in items_to_remove, string.split(' ')))
''', number=1000000)
print(t1)

t2 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
def sub(m):
    return '' if m.group() in items_to_remove else m.group()

result = re.sub(r'\w+', sub, string)
''',setup= 'import re', number=1000000)
print(t2)

t3 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = re.sub(r'|'.join(items_to_remove), '', string)
''',setup= 'import re', number=1000000)
print(t3)

t4 = timeit.timeit('''
string = 'this is a test string'
items_to_remove = ['this', 'is', 'a', 'string']
result = string
for i in items_to_remove:
    result = result.replace(i,'')
''', number=1000000)
print(t4)

outputs:

1.9832003884248448
4.408749988641971
2.124719851741177
1.085117268194475
like image 545
James Schinner Avatar asked Jul 22 '17 13:07

James Schinner


People also ask

How do you delete multiple items from a string?

To remove multiple characters from a string we can easily use the function str. replace and pass a parameter multiple characters. The String class (Str) provides a method to replace(old_str, new_str) to replace the sub-strings in a string. It replaces all the elements of the old sub-string with the new sub-string.

How do I remove certain strings from a string in Python?

In Python you can use the replace() and translate() methods to specify which characters you want to remove from the string and return a new modified string result. It is important to remember that the original string will not be altered because strings are immutable.

How do you delete multiple words from text in Python?

replace() to remove multiple characters from a string. Create a copy of the original string. Put the multiple characters that will be removed in one string. Use a for-loop to iterate through each character of the previous result.


1 Answers

You can use string.split() if you aren't confident of your string spacing.

string.split() and string.split(' ') work a little differently:

In [128]: 'this     is   a test'.split()
Out[128]: ['this', 'is', 'a', 'test']

In [129]: 'this     is   a test'.split(' ')
Out[129]: ['this', '', '', '', '', 'is', '', '', 'a', 'test']

The former splits your string without any redundant empty strings.

If you want a little more security, or if your strings could contain tabs and newlines, there's another solution with regex:

In [131]: re.split('[\s]+',  'this     is \t  a\ntest', re.M)
Out[131]: ['this', 'is', 'a', 'test']

Lastly, I would suggest converting your lookup list into a lookup set for efficient lookup in your filter:

In [135]: list(filter(lambda x: x not in {'is', 'this', 'a', 'string'}, string.split()))
Out[135]: ['test']

While on the topic of performance, a list comp is a bit faster than a filter, although less concise:

In [136]: [x for x in string.split() if x not in {'is', 'this', 'a', 'string'}]
Out[136]: ['test']
like image 199
cs95 Avatar answered Nov 15 '22 00:11

cs95