Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove HTML tags in String [closed]

Tags:

html

c#

How can I remove HTML tags from the following string?

<P style="MARGIN: 0cm 0cm 10pt" class=MsoNormal><SPAN style="LINE-HEIGHT: 115%;  FONT-FAMILY: 'Verdana','sans-serif'; COLOR: #333333; FONT-SIZE: 9pt">In an  email sent just three days before the Deepwater Horizon exploded, the onshore  <SPAN style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> manager in charge of  the drilling rig warned his supervisor that last-minute procedural changes were  creating "chaos". April emails were given to government investigators by <SPAN  style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> and reviewed by The Wall  Street Journal and are the most direct evidence yet that workers on the rig  were unhappy with the numerous changes, and had voiced their concerns to <SPAN  style="mso-bidi-font-weight: bold"><b>BP</b></SPAN>’s operations managers in  Houston. This raises further questions about whether <SPAN  style="mso-bidi-font-weight: bold"><b>BP</b></SPAN> managers properly  considered the consequences of changes they ordered on the rig, an issue  investigators say contributed to the disaster.</SPAN></p><br/>   

I'm writing it to Asponse.PDF, but the HTML tags are shown in the PDF. How can I remove them?

like image 913
jvm Avatar asked Feb 02 '11 18:02

jvm


People also ask

How do I remove text tags in HTML?

The HTML tags can be removed from a given string by using replaceAll() method of String class. We can remove the HTML tags from a given string by using a regular expression. After removing the HTML tags from a string, it will return a string as normal text.

Is it possible to remove the HTML tags from data?

Strip_tags() is a function that allows you to strip out all HTML and PHP tags from a given string (parameter one), however you can also use parameter two to specify a list of HTML tags you want.

How do I strip a string in HTML?

To strip out all the HTML tags from a string there are lots of procedures in JavaScript. In order to strip out tags we can use replace() function and can also use . textContent property, . innerText property from HTML DOM.


2 Answers

Warning: This does not work for all cases and should not be used to process untrusted user input.

using System.Text.RegularExpressions; ... const string HTML_TAG_PATTERN = "<.*?>";  static string StripHTML (string inputString) {    return Regex.Replace       (inputString, HTML_TAG_PATTERN, string.Empty); } 
like image 56
capdragon Avatar answered Oct 03 '22 21:10

capdragon


You should use the HTML Agility Pack:

HtmlDocument doc = ... string text = doc.DocumentElement.InnerText; 
like image 22
SLaks Avatar answered Oct 03 '22 22:10

SLaks