I have searched the web for days now but I can't seem to find a good solution to my problem:
For one of my projects I'm looking for a good (lightweight) MIME parser. My customer provides MIME formatted files (linear, no hierarchy) which contain 3-4 "parts". The application must be able to split those parts and process them independently.
Basically those MIME files are like raw E-Mail messages, but without the SMTP-headers. Instead they begin with the MIME-Header "MIME-Version: 1.0" and after that the parts follow.
I am using C++ for the application, so a C++ library is welcome. A standard C library is welcome, too; but it should fit the following criteria:
After days of searching I found the following libs and reasons why to not use them:
I don't really want to write my own MIME parser. MIME is so widespread that there must be some open library to handle this file format in a sane way.
So, do you guys have any ideas, suggestions or links?
Thanks in advance!
GMime is an LGPL mime parser written in C. It does depend on glib, but glib is available on Windows: 32bit and 64bit (and all Unix-based platforms, including Mac OS X). It also builds inside Visual Studio afaict, so I fail to see what the problem is. I know there is at least 1 commercial Windows vendor shipping libgmime.dll and libglib.dll in their product (Kerio Connect, iirc). Nokia even ships it on some of their phones.
There is really no such thing as a "lightweight" mime parser if you actually expect it to do anything more than than split headers on ':' and and do haphazard parsing of the Content-Type header to look for a boundary string and then go on to handle non-nested multiparts (kinda useless outside of parsing http responses and pre-canned mime messages that you control the composition of).
The reason that parsers like GMime are so "large", as far as lines of code goes, is because they are meant for developers that actually want correct and robust mime-part and header parsing/decoding. See my rant about decoding rfc2047 encoded-word tokens for an idea about how complex this can get (btw, other than GMime and MimeKit, I have yet to find any open source mime parsers capable of handling all of the edge cases discussed in my rant).
Even with all of this extra robust processing, it's still as fast or faster than most "lightweight" mime parsers are likely to be, especially considering most of them use a readline approach. I've seen "lightweight" mime parsers purport to parse 25MB email files in 2-3 seconds and consider that to be "fast". My unit tests for GMime parse 2 mbox files full of messages larger than 1.2GB (yes, gigabytes) in less time than that.
My point is that "lightweight" is a bullshit criteria by people who don't know what they are talking about.
How about judging based on something meaningful such as rfc compliance? Or by a combination of rfc compliance and performance? Either way, GMime will come out a winner in any meaningful comparison you make.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With