Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Question
Thursday, May 19, 2011 6:54 AM
I have made a small utility that will generate a checksum for a .exe file by ignoring certain bytes of a .exe file that change each time you do a build (with no code changes). So far it works. I can compile by build as many times as I like and the checksum will be the same. Change any bit of code at i.e int i = 1 to int i = 2 will result in a different checksum which is good :-)
Problem is that this is only for a build on the same PC. Take it to a another identical PC and compile the same .exe will result in different checksum. Both PC's are identical in terms of spec, VS version etc. In fact I look at the two .exe files from the same source code using a file compare tool and the files are vastly different. It almost like VS will build a .exe in random order on different PC's. Very frustrating.
I have also tried copying my solution to another directory on my PC and opening up both copies of the solution file in VS2010 and doing a compile on each file at the same time. Again the file compare is vastly different, even on the same. Its like unless I am in the exact same project the files will be vastly different. The byte size of the files is the same, just all the binary data is vatly different.
I have deleted things like the PDB file, manifest file etc and tried doing the build on the .exe files. Again the files are vastly different. Go back to doing the same build on the same solution and the checksums will be the same until I change any bit of code. This is really frustrating. I can’t work out why. Can anyone suggest anything!!!
Basically I must be able to produce the same checksum for a .exe file when building the same source code. The external auditors, internal testers and internal release teams will each build the solution file in VS2010. The .exe files for each team must match (provided the source code is the same). So far I can only generate the same checksum on a file if I build on the exact same version of a solution file on the same PC.
Note.
1. The byte size of the files will be the same for each build that I do on the same source code on any PC.
2. The bottom section of the file will be the same. Top half of the file is where all the difference are, even though the source code files are the same.
3. Normally first 132 bytes and another section of 64 bytes usually around the middle of the file change each time. These are the sections I am ignoring to get the same CRC checksum when doing a build on the same source code. That is unless you build the same source code somewhere else and then the files are vastly different from pretty much top to middle. Really weird. How is VS compiling .exe. Really seems like its in random order.
Daniel Hajduk
All replies (9)
Wednesday, July 20, 2011 1:25 PM ✅Answered
I think the build versions contain various information like the TimeStamp, etc. required to point to proper PDB files for debugging.
In such cases, it is expected that there could be difference in bits of the executable thus generated after successive builds with no code changes. And let us not forget the actual concept CHECKSUM, which is used to maintain the integrity of data and such cases, it acts a as verifier for parity. You can refer to the following KB article, which has a note stating :-
NOTE*: There is no guarantee that Visual C++ will generate the same binary image when building the same source files on successive builds. However, you are guaranteed that the EXE (or DLL) will behave in precisely the same manner under execution, all other things being equal. Compile and link options and link order play a role in whether two binary images will compare equally*.
http://support.microsoft.com/kb/164151
Also, with certain sections excluded in the PE file headers, you can see certain different bits/ set of bits in the PE file of the application.
You can use the DUMPBIN tool to view contents of the PE file.
--Trevor H.
Send files to Hotmail.com: "MS_TREVORH"
Monday, May 23, 2011 9:46 AM
Hello Daniel,
Thank you for your question.
I am currently looking into this issue and will give you an update as soon as possible.
Thank you for your understanding and support.
Best Regards,
Ziwei Chen
Ziwei Chen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Wednesday, May 25, 2011 4:22 AM | 1 vote
Hi Victor, Thanks for your response.
Just an update. I have managed to get a procedure working that will produce the same checksum value on a .exe if I build the same source code files. The external auditors and internal test team have agreed to use a custom application that I developed to generate the checksum values and will use the custom application as part of the procedure to generate checksum on the projects that we submit to them. So far this is our best option to what has been a challenging problem. A brief summary for all those interested is below.
1. Firstly I ensure that I build a project using MSBuild.exe on the command line. Using visual studio produced a totally different file each time. Strange!!!
2. I make sure the build does a /rebuild on the solution file. This effectively does a clean and rebuild. To be extra sure that the build is a clean build I run a VB Script that deletes the bin and obj folders for each project in a solution before calling MSBuild.exe with /rebuild switch. The build is on the release config of the solution as well. The external audit team and internal test team do not have Visual Studio installed on the PC's and use the Windows SDK to do the builds, so far this process works well and is the standard practice.
3. I developed a small winforms application that lets user select a .exe file from the bin folder of a Visual Studio solution / project. Simply using the file open dialog box.
2. I Read the entire file byte by byte into a byte array. I ignore the first 128 bytes of the file as this part of the file contains PE header file info that changes each time you build a project. It contains data such as the date time stamp etc. You can use a tool such as PE Explorer to open a file and explore the PE header info to see for yourself. I also search out a section of the file that contains RSDS and Z\V.4. I found during my research that there are a few bytes either side of where I find the RSDS and Z\V.4 that always change when you do a build. It looks like it contains some kind of build info, date time stamp and who know what!!! either way this was always changing and figured if I could ignore these few bytes I might just get away with being able to generate the same checksums.
3. When I find these bytes I do a mod 16 on the position of the file that the bytes were located in. I then ignore 16 bytes before and after the position where I find the above bytes (usually enough of a buffer each way to make sure I am not getting the bytes that are changing with each build). Basically if I find the RSDS and Z\V.4 at position 515 of the file. I get a mod of 3 which give me 512. I then subtract 16 and add 16 to the 512. This gives me the start and end position of what bytes I will ignore in the file. In this example it would be 496 to 528. I am doing this as I am trying to load the file in even sizes in the buffer. For the record I have noticed that the RSDS and Z\V.4 seems to be a little before the middle of the file.
4. I continue to read the file byte by byte but ignoring the bytes from step 3 above.
5. When the whole file is read I pass the bytes into a file stream which gets passed to a CRC32 algorithm. I downloaded the algorithm that I use from the net. Can't remember where, but think it was CodeProject. Someone wrote a sample VB / C# project that computes a CRC, MD5, SHA-1 value on a .exe file. This project was really helpful and my little utility is based on the code I found in this project.
6. The returned CRC32, MD5 and SHA-1 values are then returned and displayed in some text boxes. The whole process is very quick. About a second for a 100kb .exe file.
The utility will form part of the procedure to follow when the external auditors and internal test team need to generate a checksum on a file. As long as you build the same source code file the checksum will be the same. This has been tested on several PC, one with Win 7 x64, 2x win XP x32 and 1x win XP virtual PC x32. So far the checksums are re-producable as long as the code does not change in any way. Change any code in the slightest, such as int i = 1; to int i = 2; and the checksums will be different when you rebuild the solution file. This is exactly what we wanted.
its not the best solution. I would have preferred if I could specify some switches during building that would generate the same CRC checksum but for now this solution will serve our needs.
I hope this can help others. It was a hard challenege in the end. I spent a total 5 work days on this.
Daniel Hajduk
Tuesday, June 21, 2011 6:47 PM
Victor,
Any suggestions on how Microsoft suggests this problem is solved. I'm also running into this and would like to resolve this soon without having to hack .exe's myself.
Regards.
Thursday, June 30, 2011 5:17 PM
Why do you want the same CRC for executable produced by compiler after each compilation? What’s the objective?
CRC has been designed to ensure, accidental changes of computer data. Now as per my knowledge each compilation (though the source code remains same) produces binary with different time stamp and other data, which makes certain bits different in each output binary This actually helps to detect the binary produces in compilation time t1 is different than t2. This is the way it works.
You may want to check portable executable structure to figure out what information in binary normally remains same and build CRC based on that.
--Trevor H.
Send files to Hotmail.com: "MS_TREVORH"
Tuesday, July 5, 2011 10:05 PM
Hi Trevor.
I need to produce the same CRC checksum because our code gets audited by two external regualators. The regulators will get a copy of the source code, validate it and then compile the files. A CRC is generated on the compiled files. The validated code is what we are allowed to use when we do our installs.
When we get the OK from each of the auditers, our build team where I work will then also do a build and will need to generate the same checksum as what the two regualators have built. Now as you would know by now, each of the two regulators and our internal build team will compile the files and generate different checksums. The regulators will also audit the machines that are built and the build on the machines need to match that what they have verified. The checksums will always be different so the regulators can never tell if the build on the machines is the same as what they have verified and hence the problem arises. This all happens on the exact same code base. In a nutshell this is why we must be able to generate the same checksum on a piece of code that has not changed.
So far the solution I have developed (mentioned above) is holding strong and is getting around our problem. I still don't think my solution is the best, but is doing the job.
Daniel Hajduk
Friday, July 8, 2011 6:48 PM
We also have this problem.
Regulatory agencies and quasi-governmental testing labs need to verify that the source code we send them matches the binaries we send them for testing. If they can’t exactly duplicate our product by compiling at their site (if the binaries don’t match exactly byte-for-byte) then we get all sorts of hassles.
The last time, it took a few days of convincing the testing lab that ILDASM output demonstrated that the binaries were functionally the same, but I’m not sure if we can pull that off again.
We plan to make a small utility that runs as a post-build step. It will write over the bytes in DLLs and EXEs that seem to change randomly with each compile.
Saturday, May 19, 2012 1:53 PM
We have the same problem, for a slightly different reason.
Here's what I've found, based on Daniel's and Sherrod's findings above, for VS2010-generated binaries.
a) For native (non-.NET) PE files you get variation in the header from the build date and checksum, so the aforementioned "skip the header" works.
b) There's a GUID for the PDB file that's generated each build, which is the RSDS bit mentioned. There's an associated date-time field with this as well. See http://www.debuginfo.com/articles/debuginfomatch.html for info.
c) If this is a DLL, there may be import and export timestamps, so the export table must be walked as well. See http://msdn.microsoft.com/en-us/library/ms809762.aspx
d) .NET assemblies have additional GUIDs buried in them: there is a GUID stream that must be nulled. See http://jilc.sourceforge.net/ecma_p2_cil.shtml
e) In addition, referenced assemblies get a straight hash of their contents, which will vary by build, and so must also be ignored, as well as having a GUID in them (if you're generating the GUIDs on a per-build basis).
There's one last idiosyncrasy, which is compiler-generated per-build code that you don't control. I've seen some examples where e.g. doing a "switch" on a set of strings has the compiler actually generate a dictionary of string to ID, and then switch really on the ID, and that string-to-ID table gets a new "private code" GUID each build. That's tougher, and requires code changes to compensate for it.
If you get a binary diff utility and save "build 1" and "build 2", then you can really see all the differences.
Per the other note: there are many "where's the source file from" things buried in a build as well. In order to really have consistency the builds must come from the same logical place whereever you want comparisons to work, so e.g. everything builds from the same subst'd drive letter. This includes your compiler headers as well if you're not consistently installing in C:\
I've learned way too much about the PE format now...
--Kevin Kevin Bradley Philips Healthcare [email protected]
Tuesday, September 1, 2015 8:40 PM
I dont see why this is confusing. If I synch my depot to a point 6 months ago, I expect to produce the same app I did 6 months ago.
Saying they are different so you can tell the pdbs apart its crazy. WE COULD JUST USE THE SAME PDBS!!! that is if the compiler team didn't think it needed to generate completely unique executable just so they can tell the pdbs apart..
Its an idiotic circle of arguments. I dont need to tell the pdb's apart unless they are different. And the only reason they are different is as you say, so we can tell them apart.
My build machine checks in hundreds of c# dll's and exe's everyday. And the code hasn't changed. The only reason my perforce server is at capacity is so that, "I can Tell the PDB's apart"
This is my signature.