Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Remove spaces in HTML source in between markups?

I am currently working on a program that allows me to enter HTML source code into a RichTextBox control and removes the spaces from in between markups. The only problem is, I am not sure how I can differentiate between the spaces BETWEEN the markups and the spaces INSIDE the markups. Obviously, removing the spaces inside the markups would be bad. Any ideas as to how I can tell the difference?

Example: (before white space is removed)

<p>blahblahblah</p>                  <p>blahblahblah</p>

Example: (after white space is removed)

<p>blahblahblah</p><p>blahblahblah</p>
like image 788
user Avatar asked Nov 07 '09 02:11

user


1 Answers

the solution in the link that Rasik sent here it's a solution for you too

Regex.Replace(html, @"\s*(<[^>]+>)\s*", "$1", RegexOptions.Singleline);

The regular take the markup as it is and the around space characters and change it with the markup.

Edit: A better solution that work for Micheal example

Regex.Replace(txtSource.Text,
            @"\s*(?<capture><(?<markUp>\w+)>.*<\/\k<markUp>>)\s*", "${capture}", RegexOptions.Singleline);

this regular expression will detect the markup tags and don't change what it's inside and remove the spaces out side. There's some other cases to look to it too. Like the markup without ending tags.

like image 97
4 revs, 2 users 91% Avatar answered Sep 22 '22 23:09

4 revs, 2 users 91%