I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E
.
Additional rules:
Hello2019World
Hello 2019 World
T.E.S.T.
[Test] (Test) "Test" 'Test'
Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'"; // Detect where to space words string[] split = Regex.Split(title, "(?<!^)(?=(?<![.\\-'\"([{])[A-Z][\\d+]?)"); // Trim each word of extra spaces before joining split = (from e in split select e.Trim()).ToArray(); // Join into new title string newtitle = string.Join(" ", split); // Display Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.\-'"([{])(?<![A-Z])[A-Z][\d+?]?) (?<!^) // Negative look behind (?= // Positive look ahead (?<![.\-'"([{]) // Ignore if starts with punctuation (?<![A-Z]) // Ignore if starts with double Uppercase letter [A-Z] // Space after each Uppercase letter [\d+]? // Space after number )
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters \/:*?"<>|
.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
Use the list() class to split a string into a list of strings. Use a list comprehension to split a string into a list of integers.
Method: In Python, we can use the function split() to split a string and join() to join a string. the split() method in Python split a string into a list of strings after breaking the given string by the specified separator.
The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)
This patten says to make a split on a boundary of one of the following conditions:
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'"; string[] split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\\W)(?=\\W)"); split = (from e in split select e.Trim()).ToArray(); string newtitle = string.Join(" ", split); This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=\W)(?=\w)|(?<=\w)(?=\W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With