Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP, HTML, Email - Equals sign in email is being converted to different characters

Tags:

html

php

email

The Setup...

I am sending an email via PHP, in the usual way...

Code Block 0

mail('', $subject, $message, $headers);

... with the following content setup:

Code Block 1

$boundary = uniqid('np');
$message = '';

$subject = 'Email Subject';

$headers = "MIME-Version: 1.0\r\n";
$headers .= "From: SURL <[email protected]>\r\n";
$headers .= "To: ".$email."\r\n";

$headers .= "Content-Type: multipart/alternative;boundary=" . $boundary . "\r\n";

// Plain Text
$message .= "Content-Type: text/plain; charset=utf-8\r\n";
$message .= "Content-Transfer-Encoding: 7bit\r\n\r\n";

$message .= 'Hi, you handsome SOB!!
             \n\n
             We have ...
             /n/n                
             And ...
             \n\n
             http://someurl.com/.../.../?a=' . $var1 . '&b=' . $var2;

$message .= "\r\n\r\n--" . $boundary . "\r\n";

// HTML
$message .= "Content-Type: text/html; charset=utf-8\r\n";
$message .= "Content-Transfer-Encoding: quoted-printable\r\n\r\n";

$message .= '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

            <html xmlns="http://www.w3.org/1999/xhtml">

            <head>

            <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

            <title>Some Title</title>

            <style type="text/css">
              ... Lots of styles...
            </style>

            </head>
            <body>

            <div class="message_container">
              <div class="message_logo"></div>

              <div class="message_leading_line">
                Hi, you handsome SOB!!
              </div> <!-- end message_leading_line -->

              <div class="message_top_content">
                We have ...
                <br/>
                And ... 
                <a href="http://someurl.com/.../.../?a=' . $var1 . '&b=' . $var2 . '" >visit this link</a>.
              </div> <!-- end message_top_content -->

              <div class="message_bottom_content">
                If link doesn\'t work, copy and paste...
                <pre>http://someurl.com/.../.../?a=' . $var1 . '&b=' . $var2 . '</pre>
              </div> <!-- end message_bottom_content -->

            </div> <!-- end message_container -->

            </body>
            </html>';

$message .= "\r\n\r\n--" . $boundary . "--";

And this all works fine. The email sends without a hitch. Viewing the email, it is properly styled, and everything is where it should be - including all dynamically-added content.

Behavior Thus Far...

I've tried many test cases so far, and all have sent exactly as indented. Styling and all.

One such successful case had:

Code Blocks 3-1, 3-2

$var1 = '4pD9051LsVtQu96pLBH41019v28T0o4Z2I3U6urs';
$var2 = 'verPkBE415i447V6R9o';

... so that the link in the email was:

http://someurl.com/.../.../?a=4pD9051LsVtQu96pLBH41019v28T0o4Z2I3U6urs&b=verPkBE415i447V6R9o

The email sent properly, the links displayed properly, the links worked (clicking and copy/paste). Perfect!

The Issue...

for the most recent test:

Code Blocks 4-1, 4-2

$var1 = '73jbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0';
$var2 = 'avr1owgJAAB3h4l1brw';

... so that the link was:

http://someurl.com/.../.../?a=73jbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0&b= avr1owgJAAB3h4l1brw

This time, the email sent... properly? It was sent with no obvious issues. At first glance, everything was as it should be. All the styling was as it should have been, all the content was where it needed to be.

Note: The Gmail version displayed was the latter version, with all of the markup removed and presented as plain text. Still the wrong URL was presented.

But, upon closer inspection, the link displayed in the email client (Apple Mail and Gmail, both separate accounts) was:

http://someurl.com/.../.../?asjbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0&b=ver1owgJAAB3h4l1brw

The difference is immediately following the question mark. What should be ?a=73jb is instead ?asjb. Clicking on the link doesn't work, nor does copying and pasting it. For obvious reasons - duh!

The Quirk...?

The odd thing is, if view the same email as source (View -> Message -> Raw Source), the links are exactly as they should be...

Raw Source 1

To: 
Subject: Email Subject
X-PHP-Originating-Script: 2181:email.php
MIME-Version: 1.0
From:  SURL <[email protected]>
To: [email protected]
Content-Type: multipart/alternative;boundary=np57849efa2a13b
X-Identified-User: {:box895.bluehost.com:...:...e.com} {sentby:program running on server}

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Hi, you handsome SOB!!
\n\n
We have ...
/n/n                 
And ...
\n\n
http://someurl.com/.../.../?a=73jbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0&b=avr1owgJAAB3h4l1brw

--np57849efa2a13b
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>Some Title</title>

<style type="text/css">
... Lots of styles...
</style>

</head>
<body>

  <div class="message_container">
    <div class="message_logo"></div>

    <div class="message_leading_line">
      Hi, you handsome SOB!!
    </div> <!-- end message_leading_line -->

    <div class="message_top_content">
      We have ...
      <br/>
      And ... 
      <a href="http://someurl.com/.../.../?a=73jbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0&b=avr1owgJAAB3h4l1brw" >visit this link</a>.
    </div> <!-- end message_top_content -->

    <div class="message_bottom_content">
      If link doesn\'t work, copy and paste...
      <pre>http://someurl.com/.../.../?a=73jbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0&b=avr1owgJAAB3h4l1brw</pre>
    </div> <!-- end message_bottom_content -->

  </div> <!-- end message_container -->

</body>
</html>

--np57849efa2a13b--

So what's going on? The data sent appears to be proper, but it is being displayed all cattywampus on very special cases. Why does =73 convert to s (or, maybe a=73 to as...)?

I've checked ASCII tables and HTML Codes. In either case, the only thing associated with 73 is the capital letter "i".

EDIT 1

Wouldn't you know that I'd stumble upon This HTML URL Encoding Reference immediately after posting this question.

On there, you'll find %73 associated with s.

But, I'm still not quite sure how =73... is becoming %73... and then s.... Certainly seems something to do with the utf-8.

Surely I'm not the first fellow to send an email, there must be a way...

EDIT 2 - SOLUTION!!

Solution and explanation got too long, so posted it as an answer.

like image 527
Birrel Avatar asked Feb 07 '23 14:02

Birrel


1 Answers

The issue is coming from the line:

$message .= "Content-Transfer-Encoding: quoted-printable\r\n\r\n";

According to Good ol' Wiki:

QP works by using the equals sign "=" as an escape character.

There IS a way around this, if you want to keep the quoted-printable encoding:

... an ASCII equal sign (decimal value 61) must be represented by "=3D". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.

So the line...

http://someurl.com/.../.../?a=3D73jbzUN90j27ME5N6W4jh24o992V91m3R632Hlu0&b=3Davr1owgJAAB3h4l1brw

... would have worked fine (notice the =3D).

But this isn't enough!

If you are going to use quoted-printable, you need to set EVERY equals sign to =3D. This includes in your headers, and styles, and scripts, and html, etc.

For example, take a look at the first few lines of a "Forgot Your Password" email from TeamTreehouse, which also uses quoted-printable encoding:

... Headers and plain-text version above ...

Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.=
w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8">
  <title>Treehouse</title>


  <style type=3D"text/css">
    ... some styles, no equals signs ...
  </style>
</head>

<body leftmargin=3D"0" marginwidth=3D"0" topmargin=3D"0" marginheight=3D"0"=
 offset=3D"0" style=3D"-webkit-text-size-adjust: none; background: #edeff0;=
 margin: 0; padding: 0; width: 100% !important" bgcolor=3D"#edeff0">
  <center>
    <table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" height=3D"100%"=
 width=3D"100%" style=3D"background: #edeff0; color: #9ba6b0; font-family: =
Helvetica,sans-serif; font-size: 14px; height: 100% !important; margin: 0; =
padding: 0; width: 100% !important" bgcolor=3D"#edeff0">

... and so on.

This encoding type has the limitation of:

Lines of Quoted-Printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line...

Which means that the raw content can end up looking like:

...
2wtdG91WGhKTEVZVVpoWnZsamo4IiwidiI6MSwicCI6IntcInVcIjozMDA4Nzg2NixcInZcIjox=
LFwidXJsXCI6XCJodHRwOlxcXC9cXFwvdGVhbXRyZWVob3VzZS5jb21cXFwvXCIsXCJpZFwiOlw=
iN2VlMmFmYWZiNGYwNGY5MGE2Y2NjMGExZGQwODdiMWVcIixcInVybF9pZHNcIjpbXCJlMzdmNG=
JlNDQ5NzYyY2NjZDQ5MmZjNmUyZDgwMjFhMTUxODgyM2RkXCJdfSJ9">teamtreehouse.com</=
...

Not a big deal for the interpreter, if you have the appropriate encoding set, but less pretty and understandable to the naked eye.

BUT WHY USE quoted-printable!?

Continuing with our reading, it turns out that some SMTP (Simple Mail Transfer Protocol) have a line-limit of 1000 characters. By forcing a your lines to never exceed 76 characters (with the 76th being =), you don't run the risk of your emails failing for that particular reason.

How do you do it!?

It turns out, PHP has a native function, just for this...

quoted_printable_encode(string);

^ Tested it with the exact email code from the question, and works just as advertised!

To do the same (I know, it's probably obvious):

<?php 
    $var = '... some huge, long string...';

    $var2 = quoted_printable_encode($var);
?>

and then for a nice visual comparison...

<html>
<body>

<?php 

    echo '<pre>' . htmlspecialchars($var) . '</pre>'; 

    echo '<br><hr><br>';

    echo '<pre>' . htmlspecialchars($var2) . '</pre>'; 

?>

</body>
</html>

Note: For your email (like in the question), you only want to encode the part AFTER the Content-Transfer-Encoding: quoted-printable declaration...

$message .= "Content-Type: text/html; charset=utf-8\r\n";
$message .= "Content-Transfer-Encoding: quoted-printable\r\n\r\n";

$tmp = '<html>... long, formatted, beautiful...</html>';

$message .= quoted_printable_encode($tmp);

$message .= "\r\n\r\n--" . $boundary . "--";

Plus some pointless guff because StackOverflow doesn't like edits less than 6 characters! Stupid and very pointless restriction. Means simple spelling errors cannot be corrected.

like image 119
Birrel Avatar answered Feb 16 '23 04:02

Birrel