I'll give you a little bit of background first as to why I'm asking this question:
I am currently working in a stricly-regulated industry and as such our code is quite carefully looked-over by official test houses. These test houses expect to be able to build the code and generate an .exe or .dll which is EXACTLY the same each and every time (without changing any code obviously!). They check the MD5 and the SHA1 of the executables that they create to ensure this.
Up until this point I have predominantly been coding in C++, where (after a few project setting tweaks) I managed to get the projects to rebuild consistantly to the same MD5/SHA1. I am now using C# in a project and am having great difficulty getting the MD5's to match after a rebuild. I am aware that there are "Time-Stamps" in the PE header of the file, and they have been cleared to 0. I am also aware that there is a GUID for the .exe, which again has been cleared to 00 00 00... etc. However the files still don't match.
I'm using CFF Explorer to view and edit the PE Header to remove the time and date stamps. After using a binary comparison tool there are only 2 blocks of bytes in the .exe's that are different (both very small).
One of the inconsistant blocks appears just before some binary code, which in ASCII details the path of the *Project*\obj\Release\xxx.pdb
file.
EDIT: This is now known to be the GUID of the *.pdb file, however I still don't know if I can modify it without causing any errors!?
The other block appears in the middle of what looks to be function names, ie. (a typical section) AssemblyName.GetName.Version.get_Version.System.IO.Ports.SerialPort.Parity.Byte.<PrivateImplementationDetails>{
then the different code block:
4A134ACE-D6A0-461B-A47C-3A4232D90816
followed by:
"}.ValueType.__StaticArrayInitTypeSize=7.$$method0x60000ab-1.RuntimeFieldHandle.InitializeArray`... etc..
Any ideas or suggestions would be most welcome!
Update: Roslyn seems to have a /feature:deterministic
compiler flag for reproducible builds, although it's not 100% working yet.
You should be able to get rid of the debug GUID by disabling PDB generation. If not, setting the GUID to zeroes is fine - only debuggers look at that section (you won't be able to debug the assembly anymore, but it should still run fine).
The PrivateImplementationDetails are a bit more difficult - these are internal helper classes generated by the compiler for certain language constructs (array initializers, switch statements using strings, etc.). Because they are only used internally, the class name doesn't really matter, so you could just assign a running number to them.
I would do this by going through the #Strings metadata stream and replacing all strings of the form "<PrivateImplementationDetails>{GUID}" with "<PrivateImplementationDetails>{running number, padded to same length as a GUID}".
The #Strings metadata stream is simply the list of strings used by the metadata, encoded in UTF-8 and separated by \0; so finding and replacing the names should be easy once you know where the #Strings stream is inside the executable file.
Unfortunately the "metadata stream headers" containing this information are quite buried inside the file format. You'll have to start at the NT Optional Header, find the pointer to the CLI Runtime Header, resolve it to a file position using the PE section table (it's an RVA, but you need a position inside the file), then go to the metadata root and read the stream headers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With