Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing age of empires game record files(.mgx)

I am a fan of the outmoded game Age of Empires II(AoE). I want to write a parser of AoE game record(.mgx files) using Python.

I did some searching on GitHub and found little projects on this, the most useful one is aoc-mgx-format which provide some details of .mgx game record files.

Here is the problem:

according to the reference, structure of a .mgx file is like:

| header_len(4byte int) | next_pos(4byte int) | header_data | ... ... |

The hex data's byte order in mgx format is little endian.

header_len stores data length of the Header part(header_len + next_post + header_data)

header_data stores useful imformation i need, but its compressed with zlib

I tried to decompress data in header_data with zlib module like this:

import struct
import zlib

with open('test.mgx', "rb") as fp:
    # read the header_len bytes and covert it to a int reprents length of Header part
    header_len = struct.unpack("<i", fp.read(4))[0]

    # read next_pos (this is not important for me)
    next_pos = struct.unpack("<i", fp.read(4))[0]

    # then I can get data length of header_data part(compressed with zlib)
    header_data_len = header_len - 8

    compressed_data = fp.read(header_data_len)[::-1] # need to be reversed because byte order is little endian?

    try:
        zlib.decompress(compressed_data)
        print "can be decompressed!"
    except zlib.error as e:
        print e.message

but I got this after running the program:

Error -3 while decompressing data: incorrect header check

PS: Sample .mgx files can be found here: https://github.com/stefan-kolb/aoc-mgx-format/tree/master/parser/recs

like image 391
lichifeng Avatar asked Apr 17 '15 05:04

lichifeng


2 Answers

Your first problem is that you shouldn't be reversing the data; just get rid of the [::-1].

But if you do that, instead of getting that error -3, you get a different error -3, usually about an unknown compression method.

The problem is that this is headerless zlib data, much like what gzip uses. In theory, this means the information about the compression method, window, start dict, etc. has to be supplied somewhere else in the file (in gzip's case, by information in the gzip header). But in practice, everyone uses deflate with the max window size and no start dict, so if I were designing a compact format for a game back in the days when every byte counted, I'd just hardcode them. (In modern times, exactly that has been standardized in an RFC as "DEFLATE Compressed Data Format", but most 90s PC games weren't following RFCs by design...)

So:

>>> uncompressed_data = zlib.decompress(compressed_data, -zlib.MAX_WBITS)
>>> uncompressed_data[:8] # version
b'VER 9.8\x00'
>>> uncompressed_data[8:12] # unknown_const
b'\xf6(<A'

So, it not only decompressed, that looks like a version and… well, I guess anything looks like an unknown constant, but it's the same unknown constant in the spec, so I think we're good.

As the decompress docs explain, MAX_WBITS is the default/most common window size (and the only size used by what's usually called "zlib deflate" as opposed to "zlib"), and passing a negative value means that the header is suppressed; the other arguments we can leave to defaults.

See also this answer, the Advanced Functions section in the zlib docs, and RFC 1951. (Thanks to the OP for finding the links.)

like image 129
abarnert Avatar answered Nov 15 '22 11:11

abarnert


Old but here is a sample of what I did :

class GameRecordParser:

def __init__(self, filename):
    self.filename = filename
    f = open(filename, 'rb')

    # Get header size
    header_size = struct.unpack('<I', f.read(4))[0]
    sub = struct.unpack('<I', f.read(4))[0]
    if sub != 0 and sub < os.stat(filename).st_size:
        f.seek(4)
        self.header_start = 4
    else:
        self.header_start = 8

    # Get and decompress header
    header = f.read(header_size - self.header_start)
    self.header_data = zlib.decompress(header, -zlib.MAX_WBITS)

    # Get body
    self.body = f.read()
    f.close()

    # Get players data
    sep = b'\x04\x00\x00\x00Gaia'
    pos = self.header_data.find(sep) + len(sep)
    players = []
    for k in range(0, 8):
        id = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        type = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name_size = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name = self.header_data[pos:pos+name_size].decode('utf-8')
        pos += name_size
        if id < 9:
            players.append(Player(id, type, name))

Hope it helps future programmer :)

By the wway I am planning on writting such a library.

like image 21
Victor Drouin Avatar answered Nov 15 '22 10:11

Victor Drouin