I need to create an app that will extract VAT numbers that our clients send us for verification. They send nothing more with e-mails. That's for purpose of creating extended statistics.
What I need is to have a mail's body without any headers before the content I need, that is VAT number, as simple as that.
This is my script that creates the list of 30 recent e-mails:
<?
if (!function_exists('imap_open')) { die('No function'); }
if ($mbox = imap_open(<confidential>)) {
$output = "";
$messageCount = imap_num_msg($mbox);
$x = 1;
for ($i = 0; $i < 30; $i++) {
$message_id = ($messageCount - $i);
$fetch_message = imap_header($mbox, $message_id);
$mail_content = quoted_printable_decode(imap_fetchbody($mbox,$message_id, 1));
iconv(mb_detect_encoding($mail_content, mb_detect_order(), true), "UTF-8", $mail_content);
$output .= "<tr>
<td>".$x.".</td>
<td>
".$fetch_message->from[0]->mailbox."@".$fetch_message->from[0]->host."
</td>
<td>
".$fetch_message->date."
</td>
<td>
".$fetch_message->subject."
</td>
<td>
<textarea cols=\"40\">".$mail_content."</textarea>
</td>
</tr>";
$x++;
}
$smarty->assign("enquiries", $output);
$smarty->display("module_mail");
imap_close($mbox);
} else {
print_r(imap_errors());
}
?>
I've worked with imap_fetchbody, imap_header and so on to retrieve the desired content but it turns out that most of e-mails have got something else (like headers) before the content, ie.
--=-Dbl2eWTUl0Km+Tj46Ww1
Content-Type: text/plain;
------=_NextPart_001_003A_01D14F7A.F25AB3D0
Content-Type: text/plain;
--=-ucRIRGamiKb0Ot1/AkNc
Content-Type: text/plain;
I need to get rid of everything that's before the VAT number included in the mail's message but I don't know how. Some emails don't have these headers, some do. And since we're working with clients from all over the Europe, it really confuses me and leaves powerless.
Another problem is that some clients just copy-paste VAT numbers from various websites and that means these VAT numbers are often pasted with the original style (bold/background/changed colour et cetera). That might be the reason for my PS below.
I would appreciate every help that'd lead me to solving this problem.
Thank you in advance.
PS. Just for a record. With imap_fetchbody($mbox,$message_id, 1)
I need to use 1
to have the whole content. Changing 1
to anything else results in displaying NO email content at all. Literally.
Email extractors search through different layers of the internet as well as offline sites and generate a file containing the email addresses it has collected. Some email extractors can be integrated with other applications to send out email messages to the large list of recipients.
The part of the email that you define as "noise" are just part of the format of the email.
In some way is like you were reading the html code of a web page.
All those bits are boundaries. Those elements of the email are like tags in the html and like html they start and they close.
Content-Type: multipart/alternative; boundary="=-Dbl2eWTUl0Km+Tj46Ww1" // define type of email structure and boudary
--=-Dbl2eWTUl0Km+Tj46Ww1 // used to start the section
Content-Type: text/plain; // to define the type of content of the section
// here there is your VAT presumbly
--=-Dbl2eWTUl0Km+Tj46Ww1-- // used to close the section
Actually you have at least 2 solutions.
Make a custom parser by yourself or use a PECL
library called Mailparse.
$mail_lines = explode($mail_content, "\n");
foreach ($mail_lines as $key => $line) {
// jump most of the headrs
if ($key < 5) {
continue;
}
// skip tag lines
if (strpos($line, "--")) {
continue;
}
// skip Content lines
if (strpos($line, "Content")) {
continue;
}
if (empty(trim($line))) {
continue;
}
////////////////////////////////////////////////////
// here you have to insert the logic for the parser
// and extend the guard clauses
////////////////////////////////////////////////////
}
Install Mail parse sudo pecl install mailparse
.
$mail = mailparse_msg_create();
mailparse_msg_parse($mail, $mail_content);
$struct = mailparse_msg_get_structure($mail);
foreach ($struct as $st) {
$section = mailparse_msg_get_part($mail, $st);
$info = mailparse_msg_get_part_data($section);
print_r($info);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With