Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read EXE, MSI, and ZIP file metadata in Python in Linux

I am writing a Python script to index a large set of Windows installers into a DB.

I would like top know how to read the metadata information (Company, Product Name, Version, etc) from EXE, MSI and ZIP files using Python running on Linux.

Software

I am using Python 2.6.5 on Ubuntu 10.04 64-bit with Django 1.2.1.

Found so far:

Windows command line utilities that can extract EXE metadata (like filever from SysUtils), or other individual CL utils that only work in Windows. I've tried running these through Wine but they have problems and it hasn't been worth the work to go and find the libs and frameworks that those CL utils depend on and try installing them in Wine/Crossover.

Win32 modules for Python that can do some things but won't run in Linux (right?)

Secondary question:

Obviously changing the file's metadata would change the MD5 hashsum of the file. Is there a general method of hashing a file independent of the metadata besides locating it and reading it in (ex: like skipping the first 1024 byes?)

like image 647
user451500 Avatar asked Sep 18 '10 16:09

user451500


People also ask

How do I extract image metadata from Python?

Implementation: Importing modules. Load the image and extract the exif data. Convert the exif tag id(denoted by tagid in the code ) into human readable form denoted by tagname in the code and getting its respective value.

What is metadata in python?

metadata, is a Python module for accessing and managing an item's metadata. You can explore information describing your maps and data and automate your workflows, particularly for managing standards-compliant geospatial metadata.

How do I create a metadata in python?

Create Metadata We can create the metadata for the particular data frame using dataframe. scale() and dataframe. offset() methods. They are used to represent the metadata.


2 Answers

Take a look at this library: http://bitbucket.org/haypo/hachoir/wiki/Home and this example program that uses the library: http://pypi.python.org/pypi/hachoir-metadata/1.3.3. The second link is an example program which uses the Hachoir binary file manipulation library (first link) to parse the metadata.

The library can handle these formats:

  • Archives: bzip2, gzip, zip, tar
  • Audio: MPEG audio ("MP3"), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA)
  • Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
  • Misc: Torrent
  • Program: EXE
  • Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM)

Additionally, Hachoir can do some file manipulation operations which I would assume includes some primitive metadata manipulation.

like image 185
Chris Laplante Avatar answered Nov 02 '22 13:11

Chris Laplante


The hachoir-metadata get the "Product Version" but the compilers changes the "File Version". Then the version returned is not the we need.

I found a small a well working soluction:

http://pev.sourceforge.net/

I've tested with success. It's simple, fast and stable.

like image 29
Glaudiston Avatar answered Nov 02 '22 13:11

Glaudiston