Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I have regex to split string to words, numbers and punctuation marks list. How to make "a-z" and "0-9" single elements of list?

Tags:

c#

regex

It's looks so:

string[] lines = Regex.Split(line, @"\s+|(?!^)(?=\p{P})|(?<=\p{P})(?!$)");

It's split "ASds22d. asd ,156" to "ASds22d" + "." + "asd" + "," + "156".

Here is problem with strings like "a-z", "0-9" or variations like "a-c" and "4-5". My regex split "a-z 1-9" to "a" + "-" + "z" + "1" + "-" + "9" but i need just "a-z" + "1-9".

Can someone fix this regex?

like image 814
D4C Avatar asked Oct 19 '22 16:10

D4C


1 Answers

\s+|(?!^|-)(?=\p{P})|(?<=\p{P})(?<!-)(?!$)

You can try something like this.This will not split on -.If you have any examples where split on - is required it can ORed again.

See demo.

https://regex101.com/r/iS6jF6/3

like image 111
vks Avatar answered Oct 27 '22 00:10

vks