Most efficient way to remove multiple substrings from string?

Tags:

What's the most efficient method to remove a list of substrings from a string?

I'd like a cleaner, quicker way to do the following:

words = 'word1 word2 word3 word4, word5'
replace_list = ['word1', 'word3', 'word5']

def remove_multiple_strings(cur_string, replace_list):
  for cur_word in replace_list:
    cur_string = cur_string.replace(cur_word, '')
  return cur_string

remove_multiple_strings(words, replace_list)

997

asked Jun 02 '15 20:06

Boa

1 Answers

Regex:

>>> import re
>>> re.sub(r'|'.join(map(re.escape, replace_list)), '', words)
' word2  word4, '

The above one-liner is actually not as fast as your string.replace version, but definitely shorter:

>>> words = ' '.join([hashlib.sha1(str(random.random())).hexdigest()[:10] for _ in xrange(10000)])
>>> replace_list = words.split()[:1000]
>>> random.shuffle(replace_list)
>>> %timeit remove_multiple_strings(words, replace_list)
10 loops, best of 3: 49.4 ms per loop
>>> %timeit re.sub(r'|'.join(map(re.escape, replace_list)), '', words)
1 loops, best of 3: 623 ms per loop

Gosh! Almost 12x slower.

But can we improve it? Yes.

As we are only concerned with words what we can do is simply filter out words from the words string using \w+ and compare it against a set of replace_list(yes an actual set: set(replace_list)):

>>> def sub(m):
    return '' if m.group() in s else m.group()
>>> %%timeit
s = set(replace_list)
re.sub(r'\w+', sub, words)
...
100 loops, best of 3: 7.8 ms per loop

For even larger string and words the string.replace approach and my first solution will end up taking quadratic time, but the solution should run in linear time.

162

answered Oct 21 '22 11:10

Ashwini Chaudhary

Related questions
                            
                                What is the order of execution of __eq__ if one side inherits from the other? [duplicate]
                            
                                Python typing what does TypeVar(A, B, covariant=True) mean?
                            
                                weakref list in python
                            
                                Python: map in place [duplicate]
                            
                                List of References in Google App Engine for Python
                            
                                ReportLab: How to align a textobject?
                            
                                Can i set float128 as the standard float-array in numpy
                            
                                Chunking data from a large file for multiprocessing?
                            
                                Read CSV from within Zip File
                            
                                apt-get install for different python versions
                            
                                numpy.shape gives inconsistent responses - why?
                            
                                Why does numpy.r_ use brackets instead of parentheses?
                            
                                python sqlite insert named parameters or null
                            
                                Creating a tree/deeply nested dict from an indented text file in python
                            
                                How do I crop to largest interior bounding box in OpenCV?
                            
                                Pip doesn't install latest available version from pypi (argparse in this case)
                            
                                Creating same random number sequence in Python, NumPy and R
                            
                                How to get SQLite result/error codes in Python
                            
                                How to solve the 10054 error
                            
                                Retrieve the command line arguments of the Python interpreter

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Most efficient way to remove multiple substrings from string?

Tags:

performance

python

string

Boa

People also ask

1 Answers

Ashwini Chaudhary

Recent Activity

Donate For Us