Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to do about a 11000 lines C++ source file?

Merging will not be such a big nightmare as it will be when you'll get 30000 LOC file in the future. So:

  1. Stop adding more code to that file.
  2. Split it.

If you can't just stop coding during refactoring process, you could leave this big file as is for a while at least without adding more code to it: since it contains one "main class" you could inherit from it and keep inherited class(es) with overloaded functions in several new small and well designed files.


  1. Find some code in the file which is relatively stable (not changing fast, and doesn't vary much between branches) and could stand as an independent unit. Move this into its own file, and for that matter into its own class, in all branches. Because it's stable, this won't cause (many) "awkward" merges that have to be applied to a different file from the one they were originally made on, when you merge the change from one branch to another. Repeat.

  2. Find some code in the file which basically only applies to a small number of branches, and could stand alone. Doesn't matter whether it's changing fast or not, because of the small number of branches. Move this into its own classes and files. Repeat.

So, we've got rid of the code that's the same everywhere, and the code that's specific to certain branches.

This leaves you with a nucleus of badly-managed code - it's needed everywhere, but it's different in every branch (and/or it changes constantly so that some branches are running behind others), and yet it's in a single file that you're unsuccessfully trying to merge between branches. Stop doing that. Branch the file permanently, perhaps by renaming it in each branch. It's not "main" any more, it's "main for configuration X". OK, so you lose the ability to apply the same change to multiple branches by merging, but this is in any case the core of code where merging doesn't work very well. If you're having to manually manage the merges anyway to deal with conflicts, then it's no loss to manually apply them independently on each branch.

I think you're wrong to say that the kind of SCC doesn't matter, because for example git's merging abilities are probably better than the merge tool you're using. So the core problem, "merging is difficult" occurs at different times for different SCCs. However, you're unlikely to be able to change SCCs, so the issue is probably irrelevant.


It sounds to me like you're facing a number of code smells here. First of all the main class appears to violate the open/closed principle. It also sounds like it is handling too many responsibilities. Due to this I would assume the code to be more brittle than it needs to be.

While I can understand your concerns regarding traceability following a refactoring, I would expect that this class is rather hard to maintain and enhance and that any changes you do make are likely to cause side effects. I would assume that the cost of these outweighs the cost of refactoring the class.

In any case, since the code smells will only get worse with time, at least at some point the cost of these will outweigh the cost of refactoring. From your description I would assume that you're past the tipping point.

Refactoring this should be done in small steps. If possible add automated tests to verify current behavior before refactoring anything. Then pick out small areas of isolated functionality and extract these as types in order to delegate the responsibility.

In any case, it sounds like a major project, so good luck :)


The only solution I have ever imagined to such problems follows. The actual gain by the described method is progressiveness of the evolutions. No revolutions here, otherwise you'll be in trouble very fast.

Insert a new cpp class above the original main class. For now, it would basically redirect all calls to the current main class, but aim at making the API of this new class as clear and succinct as possible.

Once this has been done, you get the possibility to add new functionalities in new classes.

As for existing functionalities, you have to progressively move them in new classes as they become stable enough. You will lose SCC help for this piece of code, but there is not much that can be done about that. Just pick the right timing.

I know this is not perfect, though I hope it can help, and the process must be adapted to your needs!

Additional information

Note that Git is an SCC that can follow pieces of code from one file to another. I have heard good things about it, so it could help while you are progressively moving your work.

Git is constructed around the notion of blobs which, if I understand correctly, represent pieces of code files. Move these pieces around in different files and Git will find them, even if you modify them. Apart from the video from Linus Torvalds mentioned in comments below, I have not been able to find something clear about this.


Confucius say: "first step to getting out of hole is to stop digging hole."


Let me guess: Ten clients with divergent feature sets and a sales manager that promotes "customization"? I've worked on products like that before. We had essentially the same problem.

You recognize that having an enormous file is trouble, but even more trouble is ten versions that you have to keep "current". That's multiple maintenance. SCC can make that easier, but it can't make it right.

Before you try to break the file into parts, you need to bring the ten branches back in sync with each other so that you can see and shape all the code at once. You can do this one branch at a time, testing both branches against the same main code file. To enforce the custom behavior, you can use #ifdef and friends, but it's better as much as possible to use ordinary if/else against defined constants. This way, your compiler will verify all types and most probably eliminate "dead" object code anyway. (You may want to turn off the warning about dead code, though.)

Once there's only one version of that file shared implicitly by all branches, then it's rather easier to begin traditional refactoring methods.

The #ifdefs are primarily better for sections where the affected code only makes sense in the context of other per-branch customizations. One may argue that these also present an opportunity for the same branch-merging scheme, but don't go hog-wild. One colossal project at a time, please.

In the short run, the file will appear to grow. This is OK. What you're doing is bringing things together that need to be together. Afterwards, you'll begin to see areas that are clearly the same regardless of version; these can be left alone or refactored at will. Other areas will clearly differ depending on the version. You have a number of options in this case. One method is to delegate the differences to per-version strategy objects. Another is to derive client versions from a common abstract class. But none of these transformations are possible as long as you have ten "tips" of development in different branches.