Python re.split() vs split()

Tags:

In my quests of optimization, I discovered that that built-in split() method is about 40% faster that the re.split() equivalent.

A dummy benchmark (easily copy-pasteable):

import re, time, random 

def random_string(_len):
    letters = "ABC"
    return "".join([letters[random.randint(0,len(letters)-1)] for i in range(_len) ])

r = random_string(2000000)
pattern = re.compile(r"A")

start = time.time()
pattern.split(r)
print "with re.split : ", time.time() - start

start = time.time()
r.split("A")
print "with built-in split : ", time.time() - start

Why this difference?

673

asked Sep 21 '11 14:09

hymloth

2 Answers

re.split is expected to be slower, as the usage of regular expressions incurs some overhead.

Of course if you are splitting on a constant string, there is no point in using re.split().

105

answered Sep 18 '22 11:09

NullUserException

When in doubt, check the source code. You can see that Python s.split() is optimized for whitespace and inlined. But s.split() is for fixed delimiters only.

For the speed tradeoff, a re.split regular expression based split is far more flexible.

>>> re.split(':+',"One:two::t h r e e:::fourth field")
['One', 'two', 't h r e e', 'fourth field']
>>> "One:two::t h r e e:::fourth field".split(':')
['One', 'two', '', 't h r e e', '', '', 'fourth field']
# would require an addition step to find the empty fields...
>>> re.split('[:\d]+',"One:two:2:t h r e e:3::fourth field")
['One', 'two', 't h r e e', 'fourth field']
# try that without a regex split in an understandable way...

That re.split() is only 29% slower (or that s.split() is only 40% faster) is what should be amazing.

answered Sep 21 '22 11:09

the wolf

Related questions
                            
                                How to clone a Java object with the clone() method
                            
                                objective-c: double pointers to property not allowed?
                            
                                Why is there a SELECT 1 from table?
                            
                                java.lang.NoClassDefFoundError: org/apache/commons/discovery/tools/DiscoverSingleton
                            
                                Language independent way to get "My Documents" folder in VBA Excel 2003
                            
                                undefined method `sass' for #<Rails::Application::Configuration on Heroku
                            
                                parsing an enumeration in JSON.net
                            
                                Jenkins can not clone Git repository over Git/SSH on Windows
                            
                                Symfony2/Doctrine, having to put business logic in my controller? And duplicating controller?
                            
                                Eclipse Axis error when creating web service
                            
                                MongoDB - Permission denied for socket: /tmp/mongodb-27017.sock
                            
                                Sort array by two object properties using anonymous function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With