I have a large dump of data from an outlook email account that comes entirely in .msg files. A quick call to ubuntu's file method revealed that they were Composite Document File V2 Documents (whatever that means). I would really like to be able to read these files as plaintext. Is that possible at all?
Update: Turns out it wasn't totally possible to do what I wanted for large scale data mining on these kinds of files which was a bummer. In case you face the same issue I made a library to address this issue. https://github.com/Slater-Victoroff/msgReader
Documentation isn't great, but it's a pretty small library so it should be self explanatory.
In Thunderbird you click on File > Open > Saved message and select your . msg file.
Compound File Binary Format (CFBF), also called Compound File, Compound Document format, or Composite Document File V2 (CDF), is a compound document file format for storing numerous files and streams within a single file on a disk.
I faced the same problem this morning. I didn't find any information on the file format but it was possible to extract the required information from the file using strings and grep:
strings -e l *.msg | grep pattern
The -e l (that's a small L) converts from UTF-16.
This will only work if you can grep the data you need from the file (i.e. all required lines contain a standard string or pattern).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With