I am a fan of the outmoded game Age of Empires II(AoE). I want to write a parser of AoE game record(.mgx files) using Python.
I did some searching on GitHub and found little projects on this, the most useful one is aoc-mgx-format which provide some details of .mgx game record files.
Here is the problem:
according to the reference, structure of a .mgx file is like:
| header_len(4byte int) | next_pos(4byte int) | header_data | ... ... |
The hex data's byte order in mgx format is little endian.
header_len
stores data length of the Header part(header_len
+ next_post
+ header_data
)
header_data
stores useful imformation i need, but its compressed with zlib
I tried to decompress data in header_data
with zlib module like this:
import struct
import zlib
with open('test.mgx', "rb") as fp:
# read the header_len bytes and covert it to a int reprents length of Header part
header_len = struct.unpack("<i", fp.read(4))[0]
# read next_pos (this is not important for me)
next_pos = struct.unpack("<i", fp.read(4))[0]
# then I can get data length of header_data part(compressed with zlib)
header_data_len = header_len - 8
compressed_data = fp.read(header_data_len)[::-1] # need to be reversed because byte order is little endian?
try:
zlib.decompress(compressed_data)
print "can be decompressed!"
except zlib.error as e:
print e.message
but I got this after running the program:
Error -3 while decompressing data: incorrect header check
PS: Sample .mgx files can be found here: https://github.com/stefan-kolb/aoc-mgx-format/tree/master/parser/recs
Your first problem is that you shouldn't be reversing the data; just get rid of the [::-1]
.
But if you do that, instead of getting that error -3, you get a different error -3, usually about an unknown compression method.
The problem is that this is headerless zlib data, much like what gzip uses. In theory, this means the information about the compression method, window, start dict, etc. has to be supplied somewhere else in the file (in gzip's case, by information in the gzip header). But in practice, everyone uses deflate with the max window size and no start dict, so if I were designing a compact format for a game back in the days when every byte counted, I'd just hardcode them. (In modern times, exactly that has been standardized in an RFC as "DEFLATE Compressed Data Format", but most 90s PC games weren't following RFCs by design...)
So:
>>> uncompressed_data = zlib.decompress(compressed_data, -zlib.MAX_WBITS)
>>> uncompressed_data[:8] # version
b'VER 9.8\x00'
>>> uncompressed_data[8:12] # unknown_const
b'\xf6(<A'
So, it not only decompressed, that looks like a version and… well, I guess anything looks like an unknown constant, but it's the same unknown constant in the spec, so I think we're good.
As the decompress
docs explain, MAX_WBITS
is the default/most common window size (and the only size used by what's usually called "zlib deflate" as opposed to "zlib"), and passing a negative value means that the header is suppressed; the other arguments we can leave to defaults.
See also this answer, the Advanced Functions section in the zlib
docs, and RFC 1951. (Thanks to the OP for finding the links.)
Old but here is a sample of what I did :
class GameRecordParser:
def __init__(self, filename):
self.filename = filename
f = open(filename, 'rb')
# Get header size
header_size = struct.unpack('<I', f.read(4))[0]
sub = struct.unpack('<I', f.read(4))[0]
if sub != 0 and sub < os.stat(filename).st_size:
f.seek(4)
self.header_start = 4
else:
self.header_start = 8
# Get and decompress header
header = f.read(header_size - self.header_start)
self.header_data = zlib.decompress(header, -zlib.MAX_WBITS)
# Get body
self.body = f.read()
f.close()
# Get players data
sep = b'\x04\x00\x00\x00Gaia'
pos = self.header_data.find(sep) + len(sep)
players = []
for k in range(0, 8):
id = struct.unpack('<I', self.header_data[pos:pos+4])[0]
pos += 4
type = struct.unpack('<I', self.header_data[pos:pos+4])[0]
pos += 4
name_size = struct.unpack('<I', self.header_data[pos:pos+4])[0]
pos += 4
name = self.header_data[pos:pos+name_size].decode('utf-8')
pos += name_size
if id < 9:
players.append(Player(id, type, name))
Hope it helps future programmer :)
By the wway I am planning on writting such a library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With