I am trying to split a string into 2 parts : alphanum chars & special chars. I want to limit the occurence of the escape character
b.sc.... = ['b.sc.','...'] (Preserve "." inside word & outside word just once)
really???? = ['really','????'] (split when any other special char encountered)
I went through a lot of SO questions before posting here. I have come up with this till now: re.findall(r"[\w+|\-.+\w]+|\W+,text)`
How to proceed further?
You can use
[re.sub(r'([.-])+', r'\1', x) for x in re.findall(r'\w+(?:-+\w+)+|\w+(?:\.+\w+)*\.?|[^\w\s]+', text)]
See this regex demo
Details
\w+(?:-+\w+)+ - one or more word chars followed with one or more occurrences of - and one or more word chars| - or\w+(?:\.+\w+)*\.? - one or more word chars followed with one or more occurrences of . and one or more word chars and then an optional dot| - or[^\w\s]+ - one or more non-word and non-whitespace chars.The re.sub(r'([.-])+', r'\1', x) part is a post-processing step to replace one or more consecutive . or - chars with a single occurrence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With