What sort of database schema would you use to store email messages, with as much header information as practical/possible, into a database?
Assume that they have been fed into a script from the MTA and parsed into the relevant headers/body/attachments.
Would you store the message body whole in the database table, or split any MIME-parts apart? What about attachments?
Suggestion: create a well defined table for storing e-mail with a column for each relevant part of a message: sender, header, subject, body. It is going to be much simpler later if you want to query, for example, by subject field.
you can use varchar as your data type for email column as emails are usually composed of letters, numbers and special characters. Show activity on this post. The right value of data lenght for the email field is database-agnostic.
Inserting email (file) attachments into the target database is just as easy as mapping the data from the email body. Simply select the desired attachment from the "Insert Mode" list. You can insert any type of file attachment into your database, whether it's pure binary data or documents.
You may want to check the architecture and the DB schema of "Archiveopteryx".
You may want to use a schema where the message body and attachment records can be shared between multiple recipients on the message. It's not uncommon to see email servers where fully 50% of the disk storage is used by duplicate emails.
A simple hash of the body/attachment would be enough to see if that record was already in the database. However, you would still need to keep separate headers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With