Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Composite Document File V2 Document (.msg) files in ubuntu [closed]

Tags:

text

encoding

msg

I have a large dump of data from an outlook email account that comes entirely in .msg files. A quick call to ubuntu's file method revealed that they were Composite Document File V2 Documents (whatever that means). I would really like to be able to read these files as plaintext. Is that possible at all?

Update: Turns out it wasn't totally possible to do what I wanted for large scale data mining on these kinds of files which was a bummer. In case you face the same issue I made a library to address this issue. https://github.com/Slater-Victoroff/msgReader

Documentation isn't great, but it's a pretty small library so it should be self explanatory.

like image 758
Slater Victoroff Avatar asked Mar 09 '13 06:03

Slater Victoroff


People also ask

How do I open a .MSG file in Ubuntu?

In Thunderbird you click on File > Open > Saved message and select your . msg file.

What is composite document file V2?

Compound File Binary Format (CFBF), also called Compound File, Compound Document format, or Composite Document File V2 (CDF), is a compound document file format for storing numerous files and streams within a single file on a disk.


1 Answers

I faced the same problem this morning. I didn't find any information on the file format but it was possible to extract the required information from the file using strings and grep:

strings -e l *.msg | grep pattern

The -e l (that's a small L) converts from UTF-16.

This will only work if you can grep the data you need from the file (i.e. all required lines contain a standard string or pattern).

like image 173
Ben Mayhew Avatar answered Nov 15 '22 21:11

Ben Mayhew