Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing email "Received:" headers

Tags:

email

parsing

We need to parse Received: email headers according to RFC 5321. We need to extract domain\IPs through which the mail has traversed.Also, we need to figure if an IP is an internal IP. Is there already a library which can help out , especially in C\C++.

For example,

Received: from server.mymailhost.com (mail.mymailhost.com [126.43.75.123])
    by pilot01.cl.msu.edu (8.10.2/8.10.2) with ESMTP id NAA23597;
    Fri, 12 Jul 2002 16:11:20 -0400 (EDT)

We need to extract the "by" server.

thanks

like image 668
ravi Avatar asked Feb 02 '09 17:02

ravi


2 Answers

The format used by 'Received' lines is defined in RFC 2821, and regex can't parse it.

(You can try anyway, and for a limited subset of headers produced by known software you might succeed, but when you attach this to the range of strange stuff found in real-world mail it will fail.)

Use an existing RFC 2821 parser and you should be OK, but otherwise you should expect failure, and write the software to cope with it. Don't base anything important like a security system around it.

We need to extract the "by" server.

'from' is more likely to be of use. The hostname given in a 'by' line is as seen by the host itself, so there is no guarantee it will be a publically resolvable FQDN. And of course you don't tend to get valid (TCP-Info) there.

like image 178
bobince Avatar answered Nov 15 '22 19:11

bobince


There is a Perl Received module which is a fork of the SpamAssassin code. It returns a hash for a Received header with the relevant information. For example

{ ip => '64.12.136.4', 
  id => '875522', 
  by => 'xxx.com',
  helo => 'imo-m01.mx.aol.com' }
like image 33
karlcow Avatar answered Nov 15 '22 20:11

karlcow