I have a c# component that will recieve a file of the following types .doc, .pdf, .xls, .rtf
These will be sent by the calling siebel legacy app as a filestream.
So...
[LegacyApp] >> {Binary file stream} >> [Component]
The legacy app is a black box that cant be modified to tell the component what file type (doc,pdf,xls) it is sending. The component needs to read this binary stream and create a file on the filesystem with the right extension.
Any ideas?
Thanks for your time.
On Linux/Unix based systems you can use the file command, but I assume you want to do this manually yourself in code...
If all you have access to is the byte stream of the file, then you would need to handle each file type independently.
Most programs/components that do what you are wondering usually read the first few bytes and make a classification based on that. For example GIF files start with one of the following: GIF87a or GIF89a
Many file formats have the same signature at the start of the file, or have the same header format. This signature is refered to as a magic number as described by me on this post.
A good place to get started is to go to www.wotsit.org. It contains the file format specifications searchable by file type. You could look at the important file types that you want to handle and see if you can find some identifying factor in those file formats.
You could also search Google to try and find a library that does this classification, or look at the source code of the file command.
Yes this is possible, as MS Office (97-2007 or thereabouts) files all start with D0CF11E and then there is a subtype marker at byte 512.
A reference for these is at: http://www.garykessler.net/library/file_sigs.html
This seems to be the best list around, with all sorts of file formats - it is the main reference on wikipedia.
It doesn't give complete details on the new Office format, so this is from my own examples. DOCX files start with "PK" (as technically they are zip files) and then contain the string "word/_rels/document.xml.rels" while XLSX contain "xl/_rels/workbook.xml.rels".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With