Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convince management that reformatting the entire Java code base is safe

How would one go about proving to management that a batch reformat of all .java files in a large code base (to place the code in compliance with the company's coding standards) is safe and will not affect functionality.

The answers would have to appease the non-technical and the technical alike.

Edit: 2010-03-12Clarification for the technical among you; reformat = white space-only changes - no "organizing imports" or "reordering of member variables, methods, etc."

Edit: 2010-03-12 Thank you for the numerous responses. I am a surprised that so many of the readers have voted up mrjoltcola's response since it is simply a statement about about being paranoid and in no way proposes an answer to my question. Moreover, there is even a comment by the same contributor reiterating the question. WizzardOfOdds seconded this viewpoint (but you may not have read all the comments to see it). -jtsampson

Edit: 2010-03-12 I will post my own answer soon, though John Skeet's answer was right on the money with the MD5 suggestion (note -g:none to turn debugging off). Though it only covered the technical aspects. -jtsampson

2010-03-15 I added my own answer below. In response to what does "safe" mean, I meant that the functionality of the Java code would not be affected. A simple study of the Java compiler shows this to be the case (with a few caveats). Thos caveats were "white space only" and were pointed out by several posters. However this is not something you want to try to explain to BizOps. My aim was to elicit "how to justify doing this" type of answers and I got several great responses.

Several people mentioned source control and the "fun" that goes along with it. I specifically did not mention that as that situation is already well understood (within my context). Beware of the "gas station" effect. See my answer below.

like image 861
jtsampson Avatar asked Mar 10 '10 20:03

jtsampson


2 Answers

If it's just reformatting, then that shouldn't change the compiler output. Take a hash (MD5 should be good enough) of the build before and after the reformatting - if it's the same for every file, that clearly means it can't have altered behaviour. There's no need to run tests, etc. - if the output is byte for byte the same, it's hard to see how the tests would start failing. (Of course it might help to run the tests just for the show of it, but they're not going to prove anything that the identical binaries won't.)

EDIT: As pointed out in comments, the binaries contain line numbers. Make sure you compile with -g:none to omit debug information. That should then be okay with line numbering changes - but if you're changing names that's a more serious change, and one which could indeed be a breaking change.

I'm assuming you can reformat and rebuild without anyone caring - only checking the reformatted code back into source control should give any case for concern. I don't think Java class files have anything in them which gives a build date, etc. However, if your "formatting" changes the order of fields etc., that can have a significant effect.

like image 177
Jon Skeet Avatar answered Oct 11 '22 14:10

Jon Skeet


In a business environment, you have two challenges.

  1. Technical
  2. Political

From the technical perspective, reformatters are a mature technology. Combined with hashing/checksums, as long as the language isn't whitespace sensitive, you are technically safe to do this. You also want to make sure you do it during a downtime where no major forks are waiting to be merged. Real changes will be impossible to separate from reformatting, so do them separately. Merging may be very difficult for anyone working on a fork. Lastly, I would only do it after I've implemented complete test case coverage. Because of reason 2...

Politically, if you don't know how to convince management, how do you know it is safe? More specifically is it safe for you. For a senior, well-trusted developer, who is in control of the processes in a shop, it's an easier job, but for a developer working in a large, political, red-taped organization, you need to make sure you cover all your bases.

The argument I made in 2010 was a bit too clever perhaps, but parsers, reformatters, pretty printers are just software; they may have bugs triggered by your codebase, ESPECIALLY if this is C++. Without unit tests everywhere, with a large codebase, you may not be able to verify 100% that the end result is identical.

As a developer, I'm paranoid, and the idea makes me uneasy, but as long as you are using:

  1. Source control
  2. Proper test coverage

then you are OK.

However, ponder this: Management is now aware that you are mucking around in a million-line project with a "mass change". A previously undiscovered bug gets reported after your reformat. You are now chief suspect for causing this bug. Whether it is "safe" has multiple meanings. It might not be safe for you and your job.

This sounds trite, but a couple of years ago I remember something happen like this. We had a bug report come in a day after a nighttime maintenance window where I'd only done a reconfiguration and reboot of an IIS server. For several days, the story was that I must have screwed up, or deployed new code. Nobody said it directly, but I got the look from a VP that said so. We finally track it down to a bug that was already in the code, had been pushed previously, but did not show up until a QA person had changed a test case recently, but honestly, some people don't even remember that part; they just remember coming in the next day to a new bug.

EDIT: In response to jtsampson's edits. Your question wasn't about how to do it; it was "How to convince management that it is safe". Perhaps you should have asked, instead, "Is it safe? If so, how to do it, safely." My statement was pointing out the irony of your question, in that you assumed it was safe, without knowing how. I appreciate the technical side of reformatting, but I am pointing out that there is risk involved in anything non-trivial and unless you put the right person on it, it might get mucked up. Will this task detract from programmers' other tasks, sidetracking them for a couple of days? Will it conflict with some other coder's uncommitted revisions? Is the source under revision at all? Is there any embedded script that is whitespace sensitive, such as Python? Anything can have an unexpected side-effect; for our environment, it would be difficult to get a time window where there isn't someone working on a branch, and mass reformatting is going to make their merge pretty ugly. Hence my distaste for mass-reformatting, by hand or automated.

like image 34
codenheim Avatar answered Oct 11 '22 16:10

codenheim