Every modern source control system can slice and dice the history of a program. There are many tools to statically and dynamically analyze code. What sort of mathematical formula would allow me to integrate the amount of activity in a file along with the number of deployments of that software? We are finding that even if a program completes all of its unit tests, it requires more work than we would expect at upgrade time. A measure of this type should be possible, but sitting down and thinking about even its units has me stumped.
Update: If something gets sent to a test machine I could see marking it less rotten. If something gets sent to all test boxes I could see it getting a fresh marker. If something goes to production I could give it a nod and reduce its bitrot score. If there is a lot of activity within its files and it never gets sent anywhere I would ding the crap out of it. Don't focus on the code assume that any data I need is at hand.
What kind of commit analysis (commit comments (mentioned below) or time between commits) is fair data to apply?
Update: I think dimensional analysis could probably just be based on age. Relative to that is a little more difficult. Old code is rotten. The average age of each line of code still is simply a measure of time. Does a larger source module rot faster than a smaller, more complex one?
Update Code coverage is measured in lines. Code executed often must by definition be less rotten than code never executed. To accurately measure bitrot you would need coverage analysis to act as a damper.
2) Medium. Over 2 inches to not over 3 inches in diameter. 3) Large. Over 3 inches in diameter.
It's commonly believed that all beets that get too large will be woody, but it's based more on age rather than size. You can have small beets that have been in the ground too long get woody, and you can have larger beet varieties that produce behemoth beets which are tender and sweet all the way to the center.
Beetroot (Beta vulgaris) is a root vegetable also known as red beet, table beet, garden beet, or just beet.
Beets are a hardy root vegetable grown all over the world. The most common type of beet sold in grocery stores is Beta vulgaris, but there are many different varieties. They generally have an earthy, sweet taste that many people enjoy and are packed with nutrition.
Very interesting train of thought!
First, what is bitrot? The Software Rot article on wikipedia collects a few points:
By Moore's Law, delta(CPU)/delta(t)
is a constant factor two every 18 to 24 months. Since the environment contains more than the CPU, I would assume that this forms only a very weak lower bound on actual change in the environment. Unit: OPS/$/s, change in Operations Per Second per dollar over time
delta(users)/delta(t)
is harder to quantify, but evidence in the frequency of occurrences of the words "Age of Knowledge" in the news, I'd say that users' expectations grow exponentially too. By looking at the development of $/flops
basic economy tells us that supply is growing faster than demand, giving Moore's Law as upper bound of user change. I'll use function points ("amount of business functionality an information system provides to a user") as a measure of requirements. Unit: FP/s, change in required Function Points over time
delta(maintenance)/delta(t)
depends totally on the organisation and is usually quite high immediately before a release, when quick fixes are pushed through and when integrating big changes. Changes to various measures like SLOC, Cyclomatic Complexity or implemented function points over time can be used as a stand-in here. Another possibility would be bug-churn in the ticketing system, if available. I'll stay with implemented function points over time. Unit = FP/s, change in implemented Function Points over time
delta(refactoring)/delta(t)
can be measured as time spent not implementing new features. Unit = 1, time spent refactoring over time
So bitrot would be
d(env) d(users) d(maint) d(t)
bitrot(t) = -------- * ---------- * ---------- * ----------------
d(t) d(t) d(t) d(refactoring)
d(env) * d(users) * d(maint)
= ------------------------------
d(t)² * d(refactoring)
with a combined unit of OPS/$/s * FP/s * FP/s = (OPS*FP²) / ($*s³)
.
This is of course only a very forced pseudo-mathematical notation of what the Wikipedia article already said: bitrot arises from changes in the environment, changes in the users' requirements and changes to the code, while it is mitigated by spending time on refactoring. Every organisation will have to decide for itself how to measure those changes, I only give very general bounds.
I disagree with Charlie: minor refactoring of source code can result in very large Hamming distances, and doesn't provide a good measure of the degree to which the code has been logically modified.
I would consider looking at the length of commit comments. For a given programmer, a relatively long commit comment usually indicates that they've made a significant change to the code.
How about the simplest possible answer?
foreach (file in source control){
file.RotLevel = (Time.Now - file.LastTestedOrDeployed)
}
If a file hasn't been deployed (either to production or to a test machine) for a long time, it may be out of sync with "reality". The environment may have changed, and even if the file has not been changed, it may no longer work. So that seems to me to be a simple and accurate formula. Why make it more complex than that? Involving number of changes seems to add only uncertainty. If a file has been modified recently, does that mean it has been updated to reflect a change in the environment (which makes it "less rotten"), or have new features been added (increasing the risk of errors, and so making it "more rotten")? Modifications to a file could mean anything.
The only unambiguous factor I can think of is "how long as it been since we last verified that the file worked?"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With