Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get PreMailer.Net to not change the encoding of non-ascii characters?

I've also posted my problem as a Github Issue on the official repo.

I am using PreMailer.Net to inline CSS into HTML documents. However, when I call MoveCssInline, it encodes non-ASCII characters like '&'. For example:

<a href="http://www.website.com/page?param1=a&param2=b"></a>

Is changed to:

<a href="http://www.website.com/page?param1=a&amp;param2=b"></a>

I thought this behavior would be limited to URLs and href values but it turns out it also encodes innerHTML/content. For instance:

Additionally, I have tested further and found that this encoding is not just done on attributes like href. It in fact will also encode text/InnerHTML values, which are absolutely valid html without encoding. Example:

<p>&</p>

This is valid HTML and should not be encoded, but PreMailer.Net will change this to:

<p>&amp;</p>

Does anyone have a fix or workaround for this? I do not have control over the HTML documents and am not allowed to change the URLs or content other than inlining the CSS.

like image 434
Captain Stack Avatar asked Feb 22 '20 04:02

Captain Stack


1 Answers

Depending on your individual needs, as merely a guide, try these:

        Symbols.Ampersand: temp.Append("&amp;")
        Symbols.NoBreakSpace: temp.Append("&nbsp;")
        Symbols.GreaterThan: temp.Append("&gt;")
        Symbols.LessThan: temp.Append("&lt;")

Update:

These lines come from lines 132-139 of a PreMailer.Net dependency called AngleSharp, which is an HTML parser.

Currently, as far as I can tell the encoding is mandatory on AngleSharp, and hence it cannot be avoided with any setting in either AngleSharp or PreMailer.Net.

According to the following closed issue, this is by design in accordance with the HTML spec. However, I believe there is still a bug as it should only encode attribute values, not innerHTML content. Additionally, I don't think it is an acceptable behavior for a CSS inliner, which should not be validating or sanitizing HTML. Additionally, I don't even think the parser should be making changes that are not asked for by the client.

like image 95
Mech Avatar answered Nov 11 '22 20:11

Mech