Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separate title string with no spaces into words

Tags:

c#

regex

I want to find and separate words in a title that has no spaces.

Before:

ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]

After:

This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'


I'm looking for a regular expression rule that can do the following.

I thought I'd identify each word if it starts with an uppercase letter.

But also preserve all uppercase words as not to space them into A L L U P P E R C A S E.

Additional rules:

  • Space a letter if it touches a number: Hello2019World Hello 2019 World
  • Ignore spacing initials that contain periods, hyphens, or underscores T.E.S.T.
  • Ignore spacing if between brackets, parentheses, or quotes [Test] (Test) "Test" 'Test'
  • Preserve hyphens Hello-World

C#

https://rextester.com/GAZJS38767

// Title without spaces string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";  // Detect where to space words string[] split =  Regex.Split(title, "(?<!^)(?=(?<![.\\-'\"([{])[A-Z][\\d+]?)");  // Trim each word of extra spaces before joining split = (from e in split          select e.Trim()).ToArray();  // Join into new title string newtitle = string.Join(" ", split);  // Display Console.WriteLine(newtitle); 

Regular expression

I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.

https://regex101.com/r/9IIYGX/1

(?<!^)(?=(?<![.\-'"([{])(?<![A-Z])[A-Z][\d+?]?)  (?<!^)          // Negative look behind  (?=             // Positive look ahead  (?<![.\-'"([{]) // Ignore if starts with punctuation (?<![A-Z])      // Ignore if starts with double Uppercase letter [A-Z]           // Space after each Uppercase letter [\d+]?          // Space after number  ) 

Solution

Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters \/:*?"<>|.

https://rextester.com/FYEVE73725

https://regex101.com/r/xi8L4z/1

like image 263
Matt McManis Avatar asked Mar 11 '19 05:03

Matt McManis


People also ask

How do you split a string that has no spaces?

Use the list() class to split a string into a list of strings. Use a list comprehension to split a string into a list of integers.

How do you split joined words in Python?

Method: In Python, we can use the function split() to split a string and join() to join a string. the split() method in Python split a string into a list of strings after breaking the given string by the specified separator.

How do I split a string into a list of words?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.


1 Answers

Here is a regex which seems to work well, at least for your sample input:

(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W) 

This patten says to make a split on a boundary of one of the following conditions:

  • what precedes is a lowercase, and what precedes is an uppercase (or vice-versa)
  • what precedes is a digit and what follows is a letter (or vice-versa)
  • what precedes and what follows is a non word character (e.g. quote, parenthesis, etc.)


string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'"; string[] split =  Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\\W)(?=\\W)");  split = (from e in split select e.Trim()).ToArray(); string newtitle = string.Join(" ", split);  This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test' 

Note: You might also want to add this assertion to the regex alternation:

(?<=\W)(?=\w)|(?<=\w)(?=\W) 

We got away with this here, because this boundary condition never happened. But you might need it with other inputs.

like image 114
Tim Biegeleisen Avatar answered Oct 07 '22 19:10

Tim Biegeleisen