I've already awarded a 100 point bounty to mario's answer, but might start a second 100 point bounty if I see new good answers coming in. This is why I'm keeping the question open and will not choose a final answer, despite having awarded the bounty to mario.
This might seem like a simple question (study the code and refactor) but I'm hoping those with lots more experience can give me some solid advice.
The library is an open source 20,000 line library that's all in a single file and which I haven't written myself. The code looks badly written and the single file is even a bigger problem, because it freezes eclipse for half a minute at least every time I want to make a change, which is one of the reasons I think it's worth it to refactor this library into smaller classes.
So aside from reading the code and trying to understand it, are there common (or not so common) tips when refactoring a library such as this? What do you advise to make my life a little easier?
Thanks to everyone for your comments.
Refactoring your code makes it easier to read and maintain. Refactoring doesn't change the external functionality of your code; it changes the way the code achieves that functionality.
Refactor first before adding any new featureslonger to finish the project, but it will also reduce the amount of technical debt you or the product owner will have to deal with in the future.
If you have to deal with someone else's dirty code, try to refactor it first. Clean code is much easier to grasp. You will improve it not only for yourself but also for those who use it after you. Refactoring makes it easier to add new features.
General logic based on this: If points 1-5 are all true, don't refactor. If any of points 2, 3, or 5 are false for multiple reasons (for example, multiple bugs would be fixed or multiple features would be easier to implement), count them as false once for each reason they are false.
A few generic principles apply:
Divide and conquer. Split the file into smaller, logical libraries and function groupings. You will learn more about the library this way, and make it easier to understand and test incrementally.
Remove duplication. Look for repeated functions and concepts, and replace them with standard library functions, or centralized functions within the library.
Add consistency. Smooth out parameters and naming.
Add unit tests. This is the most important part of refactoring a library. Use jUnit (or similar), and add tests that you can use to verify that the functions are both correct, and that they have not changed.
Add docs. Document your understanding of the consistent, improved library as you write your tests.
If the code is badly written, it is likely that it has a lot of cloning. Finding and getting rid of the clones would then likely make it a lot more maintainable as well as reducing its size.
You can find a variety of clone detectors, these specifically for PHP:
ranked in least-to-most capability order (IMHO with my strong personal self-interest in CloneDR) in terms of qualitatively different ability to detect interesting clones.
If the code is badly written, a lot of it might be dead. It would be worthwhile to find out which part executes in practice, and which does not. A test coverage tool can give you good insight into the answer for this question, even in the absence of tests (you simply exercise your program by hand). What the test coverage tool says executes, obviously isn't dead. What doesn't execute... might be worth further investigation to see if you can remove it. A test coverage tool is also useful to tell you how much of the code is exercised by your unit tests, as suggested by another answer. Finally, a test coverage tool can help you find where some of the functionality is: exercise the functionality from the outside, and whatever code the test coverage tool says is executed is probably relevant.
Our PHP Test Coverage Tool can collect test coverage data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With