I am soon to join a PHP project that has been developed over the course of several years. It's going to be huge, sparsely documented, many files, piles of code, no consitent quality level is to be expected.
How would you go about gathering as much information as possible about what is going on?
Autoloading is not be expected, at least not extensively, so inclued might do a good job revealing the interdependencies.
Having phpDocumentor digest the project files might give an idea about which classes/methods/functions are present.
Maybe phpCallGraph for method/function relations.
Profiling some generic use cases with XDebug to gain an idea about the hierarchies and concepts.
Inspecting important log-files ... checking out warnings, deprecated usages, errors.
phpinfo().
Maybe extracting all comments and process them into a html-file.
Didn't cover Unit-Tests, Databases, ....
What would you do? What are your experiences with mentioned tools to get the most out of them?
You can assume any condition necessary.
What statistical information could be useful to extract?
Has somebody experience with those tools?
EDIT from "PHP Tools for quality check":
EDIT 2 from Bryan Waters' answer:
phploc - phploc is a tool for quickly measuring the size of a PHP project.
Inspecting Apache logs and Google Analytics data to find out about the top requested URLs and then analyze what happens using XDebug profiling and a tool like KCachegrind.
See his answer for concrete techniques.
Setting up a deployment / build / CI cycle for PHP projects - suggested by Pekka
EDIT 3
Just found this PDF of a talk by Gabriele Santini - "Statistical analysis of the code - Listen to your PHP code". This is like a gold mine.
I agreee that your question does have most of the answers.
This is what I would probably do. I would probably start with Sebastian Bergman's tools, especially phploc so you can get an idea of the scope of the mess (codebase) you are looking at. It gives you class, function counts, etc not just lines of code.
Next I would look in the apache logs or google analytics and get the top 10 most requested php url's. I'd setup XDebug with profiling and run through those top 10 requests and get the files, call tree. (You can view these with a cachegrinder tool)
Finally, I'd read through the entire execution path of 1 or two of those traces, that is most representative of the whole. I'd use my Eclipse IDE but print them out and go to town with a highlighter is valid as well.
The top 10 method might fail you if there are multiple systems cobbled together. You should see quickly with Xdebug whether the top 10 are coded similarliy are if each is a unique island.
I would look at the mysql databases and try to understand what they are all for, espacially looking at table prefixes, you may have a couple of different apps stacked on top of each other. If there are large parts of the db not touched by the top 10 you need to go hunting for the subapps. If you find other sub apps run them through the xdebug profiler and then read through one of the paths that is representative of that sub app.
Now go back and look at your scope numbers from phploc and see what percentage of the codebase (probably count classes, or functions) was untouched during your review.
You should have a basic understanding of the most often run code and and idea of how many nooks and crannies and closets for skeleton storage there are.
Perhaps you can set up a continuous integration enviroment. In this enviroment you could gather all the statistics you want.
Jenkins is a fine CI server with loads of plugins and documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With