<p>How can I remove all the HTML tags including &nbsp using regex in C#. My string looks like</p> <pre class="prettyprint"><code> "<div>hello</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>" </code></pre>

<p>If you can't use an HTML parser oriented solution to filter out the tags, here's a simple regex for it.</p> <pre class="prettyprint"><code>string noHTML = Regex.Replace(inputHTML, @"<[^>]+>|&nbsp;", "").Trim(); </code></pre> <p>You should ideally make another pass through a regex filter that takes care of multiple spaces as</p> <pre class="prettyprint"><code>string noHTMLNormalised = Regex.Replace(noHTML, @"\s{2,}", " "); </code></pre>

<p>I took @Ravi Thapliyal's code and made a method: It is simple and might not clean everything, but so far it is doing what I need it to do.</p> <pre class="prettyprint"><code>public static string ScrubHtml(string value) { var step1 = Regex.Replace(value, @"<[^>]+>|&nbsp;", "").Trim(); var step2 = Regex.Replace(step1, @"\s{2,}", " "); return step2; } </code></pre>

Remove HTML tags from string including &nbsp in C#

How can I remove all the HTML tags including &nbsp using regex in C#. My string looks like

  "<div>hello</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>"

How do I remove all tags from a string?

To strip out all the HTML tags from a string there are lots of procedures in JavaScript. In order to strip out tags we can use replace() function and can also use . textContent property, . innerText property from HTML DOM.

How do you remove HTML tags in HTML?

Approach: Select the HTML element which need to remove. Use JavaScript remove() and removeChild() method to remove the element from the HTML document.

Which tag is used to remove all HTML tags from a string?

Definition and Usage. The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped.

If you can't use an HTML parser oriented solution to filter out the tags, here's a simple regex for it.

string noHTML = Regex.Replace(inputHTML, @"<[^>]+>|&nbsp;", "").Trim();

You should ideally make another pass through a regex filter that takes care of multiple spaces as

string noHTMLNormalised = Regex.Replace(noHTML, @"\s{2,}", " ");

I took @Ravi Thapliyal's code and made a method: It is simple and might not clean everything, but so far it is doing what I need it to do.

public static string ScrubHtml(string value) {     var step1 = Regex.Replace(value, @"<[^>]+>|&nbsp;", "").Trim();     var step2 = Regex.Replace(step1, @"\s{2,}", " ");     return step2; }

Remove HTML tags from string including &nbsp in C#

Tags:

html

string

c#

regex

rampuriyaaa

People also ask

2 Answers

Ravi K Thapliyal

Don Rolling

Recent Activity

Donate For Us

Remove HTML tags from string including &nbsp in C#

Tags:

html

string

c#

regex

rampuriyaaa

People also ask

2 Answers

Ravi K Thapliyal

Don Rolling

Related questions

Recent Activity

Donate For Us