I was looking at the responses to this earlier-asked question:
Split Strings with Multiple Delimiters?
For my variant of this problem, I wanted to split on everything that wasn't from a specific set of chars. Which led me to a solution I liked, until I found this apparent bug. Is this a bug or some quirk of python I'm unfamiliar with?
>>> b = "Which_of'these-markers/does,it:choose to;split!on?"
>>> b1 = re.split("[^a-zA-Z0-9_'-/]+", b)
>>> b1
["Which_of'these-markers/does,it", 'choose', 'to', 'split', 'on', '']
I'm not understanding why it doesn't split on a comma (','), given that a comma is not in my exception list?
The '-/ inside a character class created a range that includes a comma:

When you need to put a literal hyphen in a Python re pattern, put it:
[-A-Z] (matches an uppercase ASCII letter and -)[A-Z()-] (matches an uppercase ASCII letter, (, ) or -)[A-Z-+] (matches an uppercase ASCII letter, - or +)You cannot put it after a shorthand, right before a standalone symbol (as in [\w-+], it will cause a bad character range error). This is valid in .NET and some other regex flavors, but is not valid in Python re.
Put the hyphen at the end of it, or escape it.
Use
re.split(r"[^a-zA-Z0-9_'/-]+", b)
In Python 2.7, you may even contract it to
re.split(r"[^\w'/-]+", b)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With