Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Render or convert Html to 'formatted' Text (.NET)

I'm importing some data from another test/bug tracking tool into tfs, and I would like to convert it's description, which is in simple HTML, so a plain string, where the 'layout' of the HTML is preserved.

For example:

<body>
  <ol>
    <li>Log on with user Acme &amp; Co.</li>
    <li>Navigate to the details tab</li>
    <li>Check the official name</li>
  </ol>
  <br>
  <br>
  Expected Result:<br>
  official name is filled in<br>
  <br>
  Actual Result:<br>
  The &amp;-sign is not shown correctly<br>
  See attachement.
</body>

Would become plain text with newlines inserted and HTML-entities translated like:

1. Log on with user Acme & Co.
2. Navigate to the details tab
3. Check the official name

Expected Result:
official name is filled in

Actual Result:
The &-sign is not shown correctly
See attachment

I can currently replace some tags with newlines using a regex and strip the rest, but replacing the HTML-entities and stuff like <ol> and <ul> seemed like I'm re-inventing something (browser?). So I was wondering if someone has done this before me. I can't find it using Google.

like image 355
Rudi Avatar asked Dec 10 '08 16:12

Rudi


1 Answers

Rather than regex, you could try loading it into the HTML agility pack? If it was xhtml, then an xslt transformation might be a good option.

like image 111
Marc Gravell Avatar answered Oct 02 '22 03:10

Marc Gravell