I was able to use this question as a starting point in parsing an "mht" file but the "3D" in the anchor tags (e.g.: <a href=3D"[my anchor]">[anchor text]></a>) breaks all the internal links and embedded images. I can have the parser replace "=3D" with just "=" (e.g.: <a href="[my anchor]">[anchor text]></a>) and it appears to work fine but I want to understand the purpose of that "meta markup".
Why does exporting from ".docx" to ".mht" add "3D" to the right-hand sides of most (if not all) of the html attributes? Is there a better way to handle them or a better regex to use when replacing them?
The =3D is a result of quoted printable encoding. It shouldn't be too hard to find a java library for decoding quoted printable data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With