I had to store the user input text in my database with HTML and CSS
formats.
The case is:
RadEditor ,The user copy the text from MSWord to this editor then i store this text in the database with that format . then when retrieve the data in the report or some label some tags appear wrapping the text !!
I use regular expression to remove all the formats but in vain it succeeds sometimes and not all the time .
private static Regex oClearHtmlScript = new Regex(@"<(.|\n)*?>", RegexOptions.Compiled);
public static string RemoveAllHTMLTags(string sHtml)
{
sHtml = sHtml.Replace(" ", string.Empty);
sHtml = sHtml.Replace(">", ">");
sHtml = sHtml.Replace("<", "<");
sHtml = sHtml.Replace("&", "&");
if (string.IsNullOrEmpty(sHtml))
return string.Empty;
return oClearHtmlScript.Replace(sHtml, string.Empty);
}
I ask How to remove all the format using HTMLAgility or any dependable way to ensure the text is pure ?
Note:
The datatype of this field in the database is Lvarchar
This should strip out all html tags from a string.
sHtml = Regex.Replace(sHtml, "<.*?>", "");
HtmlAgility pack makes working with HTML easy.
HtmlDocument mainDoc = new HtmlDocument();
string htmlString = "<html><body><h1>Test</h1> more text</body></html>"
mainDoc.LoadHtml(htmlString);
string cleanText = mainDoc.DocumentNode.InnerText;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With