Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Raw floating point encoding

Tags:

Update The original question is no longer the appropriate question for this problem, so I'm going to leave this alone to demonstrate what I tried/learned and for the background. It's clear that this is not just a "Base64 variation" and is a bit more involved.

Background: I program in python 3.x mainly for use with the open source program Blender. I'm a novice/amateur level programmer but I understand the big concepts fairly well I've read these articles relevant to my question.

  • Wikipedia on Base64
  • Base64 can get you pwned (pdf)
  • stackoverflow discussion
  • Some others

Problem: I have a binary file which contains 3d mesh data (lists of floats and lists of integers) corresponding to x,y,z coordinates for each vertex (floats) and the indices of the vertices which make up the faces of the mesh (integers). The file is organized in an xml'ish kind of feeling...

<SomeFieldLabel and header like info>**thensomedatabetween**</SomeFieldLabel> 

Here is the example from the "Vertices" field

<Vertices vertex_count="42816" base64_encoded_bytes="513792" check_value="4133547451">685506bytes of b64 encoded data </Vertices> 
  1. There are 685506 bytes of data between "Vertices" and "/Vertices"
  2. Those bytes only consist of a-a, A-Z, 0-9, and +,/ which is standard for base64
  3. When I grab those bytes, and use standard base64decode in python, I get 513792 bytes back out
  4. If vertex_count="42816" can be believed, there should be 42816*12bytes needed to represent x,y,z for each vertex. 42816*12 = 513792. excellent.
  5. Now, if I try and unpack my decoded bytes as 32bit floats, I get garbage...so something is ammis.

I'm thinking there is an extra cryptographic step somewhere. Perhaps there is a translation table, rotation cipher or some kind of stream cipher? It's strange that the number of bytes is correct but that the results are not which should limit the possibilities. Any ideas? Here are two example files with the file extension changed to *.mesh. I don't want to publicly out this file format, just want to write an importer for Blender so I can use the models.

Here are two example files. I have extracted the raw binary (not b64 decoded) from the Vertices and Facets fields as well as provided the bounding box information from a "Viewer" for this type of file provided by the company.
Example File 1

  • unmodified file
  • vertices binary:
  • facets binary:
  • Decrypted Data: This is a .zip containing the decrypted vertices field and the decrypted faces field (mesh2.vertices and mesh2.faces respectively). It also contains a .stl mesh file which can be viewed/opened in many applications.

Example File 2

  • unmodified file
  • vertices binary:
  • facets binary:
  • Bounding Box: Min[-4.6, -40.3, -7.3] Max[7.5, -23.1, 2.6]

Notes About the Vertices field

  • The header specifies the vertex_count
  • The header specifies base64_encoded_bytes which is the # of bytes BEFORE base64 encoding takes place
  • The header specifies a "check_value" whose significance is yet to be determined
  • The data in the field only contains the standard base64 characters
  • After standard base64 decoding the output data has... length = vertex_count*12 = base64_encoded_bytes. Occasionally there are 4 extra bytes in the b64 output? -the ratio of encoded/decoded bytes is 4/3 which is also typical base64

Notes about the Facets field

  • The header specifies a facet_count
  • The header base64_encoded_bytes which is the # of bytes BEFORE base64 encoding takes place

  • The ratio of base64_encoded_bytes/facet_count seems to vary quite a bit. From 1.1 to about 1.2. We would expect a ratio of 12 if they were encoded as 3x4byte integers corresponding to the vertex indices. So either this field is compresesed or the model is saved with triangle strips, or both :-/

More Snooping
I opened up the viewer.exe (in a hex editor) which is provided by the company to view these files (also where I got the bounding box info). Here are some snippets which I found interesting and could further the search.

f_LicenseClient...Ì[email protected][email protected][email protected][email protected]_bLoadXXXXXXInternalEncrypted...¼[email protected]_strSiteKey....í†......

In LoadXXXXXXInternalEncrypted and SaveXXXXXXInternalEncrypted I've blocked out the company name with XX. It looks like we definitely have some encryption beyond a simple base64 table variation.

SaveEncryptedModelToStream.................Self...pUx....Model...ˆÃC....Stream....

This to me looks like a function definition on how to save an encrypted model.

DefaultEncryptionMethod¼!@........ÿ.......€...€ÿÿ.DefaultEncryptionKey€–†....ÿ...ÿ.......€....ÿÿ.DefaultIncludeModelData –†....ÿ...ÿ.......€...€ÿÿ.DefaultVersion.@

Ahhh...now that is interesting. A default encryption key. Notice there are 27 bytes between each of those descriptors and they always end with "ÿÿ." Here is 24 bytes excluding "ÿÿ." To me, this is a 192 bit key...but who knows if all 24 of those bytes correspond to the key? Any thoughts?

80 96 86 00 18 00 00 FF 18 00 00 FF 01 00 00 00 00 00 00 80 01 00 00 00

Code Snippets
To save space in this thread, I put this script in my drop-box for download. It reads through the fiel, extracts basic info from the vertices and facets fields, and prints out a bunch of stuff. You can de-comment the end to have it save a data block into a separate file for easier analysis.
basic_mesh_read.py

This is the code I used to try all "reasonable" variations on the standard base64 library. try_all_b64_tables.py

like image 586
patmo141 Avatar asked Feb 22 '12 21:02

patmo141


People also ask

How are floating-point numbers encoded?

Floating-point numbers consist of an ``exponent,'' ``significand'', and ``sign bit''. For a negative number, we may set the sign bit of the floating-point word and negate the number to be encoded, leaving only nonnegative numbers to be considered.

How many bits is a floating-point number?

The most commonly used floating point standard is the IEEE standard. According to this standard, floating point numbers are represented with 32 bits (single precision) or 64 bits (double precision).

Is float always 32-bit?

Float is a 32-bit data type representing the single precision floating-point format, in IEEE 754-1985 called single, in IEEE 754-2008 the 32-bit base 2 format is officially referred to as binary32.


1 Answers

I am not sure why you think the results are not floating point numbers. The vertices data in the "decrypted data" you gave, contains as first 4 bytes "f2 01 31 41". Given an LSB byte order, that corresponds to the bit pattern "413101f2", which is the IEEE 754 representation of the float value 11.062973. All the 4 byte values in that file are in that same range, so I assume they all are float values.

like image 58
fishinear Avatar answered Sep 22 '22 06:09

fishinear