So I used shlex.split()
recently to split a command as argument to subprocess.Popen()
function. I recalled that long back I also used re.split()
function to split a string with a specific delimiter specified. Can someone point out what is the essential difference in between them? In which scenario is each function best suited?
Running a regular expression means that you are running a state machine for each character. Doing a split with a constant string means that you are just searching for the string. The second is a much less complicated procedure. @eyquem That does search without use of a state machine.
The shlex module defines the following functions: shlex. split (s, comments=False, posix=True) Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be disabled (setting the commenters attribute of the shlex instance to the empty string).
The re. split() function splits the given string according to the occurrence of a particular character or pattern. Upon finding the pattern, this function returns the remaining characters from the string in a list.
shlex. quote() escapes the shell's parsing, but it does not escape the argument parser of the command you're calling, and some additional tool-specific escaping needs to be done manually, especially if your string starts with a dash ( - ).
shlex.split()
is designed to work like the shell's split mechanism.
This means doing things like respecting quotes, etc.
>>> shlex.split("this is 'my string' that --has=arguments -or=something") ['this', 'is', 'my string', 'that', '--has=arguments', '-or=something']
re.split()
will just split on whatever pattern you define.
>>> re.split('\s', "this is 'my string' that --has=arguments -or=something") ['this', 'is', "'my", "string'", 'that', '--has=arguments', '-or=something']
Trying to define your own regex to work like shlex.split
is needlessly complicated, if it's even possible.
To really see the differences between the two, you can always Use the Source, Luke:
>>> re.__file__ '/usr/lib/python3.5/re.py' >>> shlex.__file__ '/usr/lib/python3.5/shlex.py'
Open these files in your favorite editor and start poking around, you'll find that they operate quite differently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With