I'm importing some data from another test/bug tracking tool into tfs, and I would like to convert it's description, which is in simple HTML, so a plain string, where the 'layout' of the HTML is preserved.
For example:
<body>
  <ol>
    <li>Log on with user Acme & Co.</li>
    <li>Navigate to the details tab</li>
    <li>Check the official name</li>
  </ol>
  <br>
  <br>
  Expected Result:<br>
  official name is filled in<br>
  <br>
  Actual Result:<br>
  The &-sign is not shown correctly<br>
  See attachement.
</body>
Would become plain text with newlines inserted and HTML-entities translated like:
1. Log on with user Acme & Co. 2. Navigate to the details tab 3. Check the official name Expected Result: official name is filled in Actual Result: The &-sign is not shown correctly See attachment
I can currently replace some tags with newlines using a regex and strip the rest, but replacing the HTML-entities and stuff like <ol> and <ul> seemed like I'm re-inventing something (browser?). So I was wondering if someone has done this before me. I can't find it using Google.
Rather than regex, you could try loading it into the HTML agility pack? If it was xhtml, then an xslt transformation might be a good option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With