Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does bitrot have any accepted dimensions?

Every modern source control system can slice and dice the history of a program. There are many tools to statically and dynamically analyze code. What sort of mathematical formula would allow me to integrate the amount of activity in a file along with the number of deployments of that software? We are finding that even if a program completes all of its unit tests, it requires more work than we would expect at upgrade time. A measure of this type should be possible, but sitting down and thinking about even its units has me stumped.

Update: If something gets sent to a test machine I could see marking it less rotten. If something gets sent to all test boxes I could see it getting a fresh marker. If something goes to production I could give it a nod and reduce its bitrot score. If there is a lot of activity within its files and it never gets sent anywhere I would ding the crap out of it. Don't focus on the code assume that any data I need is at hand.

What kind of commit analysis (commit comments (mentioned below) or time between commits) is fair data to apply?

Update: I think dimensional analysis could probably just be based on age. Relative to that is a little more difficult. Old code is rotten. The average age of each line of code still is simply a measure of time. Does a larger source module rot faster than a smaller, more complex one?

Update Code coverage is measured in lines. Code executed often must by definition be less rotten than code never executed. To accurately measure bitrot you would need coverage analysis to act as a damper.

like image 239
ojblass Avatar asked Apr 03 '09 21:04

ojblass


People also ask

What size is a large beetroot?

2) Medium. Over 2 inches to not over 3 inches in diameter. 3) Large. Over 3 inches in diameter.

Can beetroot be too big?

It's commonly believed that all beets that get too large will be woody, but it's based more on age rather than size. You can have small beets that have been in the ground too long get woody, and you can have larger beet varieties that produce behemoth beets which are tender and sweet all the way to the center.

Can beetroot be considered a fruit?

Beetroot (Beta vulgaris) is a root vegetable also known as red beet, table beet, garden beet, or just beet.

Is beetroot a vegetable or fruit?

Beets are a hardy root vegetable grown all over the world. The most common type of beet sold in grocery stores is Beta vulgaris, but there are many different varieties. They generally have an earthy, sweet taste that many people enjoy and are packed with nutrition.


3 Answers

Very interesting train of thought!

First, what is bitrot? The Software Rot article on wikipedia collects a few points:

  • Environment change: changes in the runtime
  • Unused code: changes in the usage patterns
  • Rarely updated code: changes through maintenance
  • Refactoring: a way to stem bitrot

By Moore's Law, delta(CPU)/delta(t) is a constant factor two every 18 to 24 months. Since the environment contains more than the CPU, I would assume that this forms only a very weak lower bound on actual change in the environment. Unit: OPS/$/s, change in Operations Per Second per dollar over time

delta(users)/delta(t) is harder to quantify, but evidence in the frequency of occurrences of the words "Age of Knowledge" in the news, I'd say that users' expectations grow exponentially too. By looking at the development of $/flops basic economy tells us that supply is growing faster than demand, giving Moore's Law as upper bound of user change. I'll use function points ("amount of business functionality an information system provides to a user") as a measure of requirements. Unit: FP/s, change in required Function Points over time

delta(maintenance)/delta(t) depends totally on the organisation and is usually quite high immediately before a release, when quick fixes are pushed through and when integrating big changes. Changes to various measures like SLOC, Cyclomatic Complexity or implemented function points over time can be used as a stand-in here. Another possibility would be bug-churn in the ticketing system, if available. I'll stay with implemented function points over time. Unit = FP/s, change in implemented Function Points over time

delta(refactoring)/delta(t) can be measured as time spent not implementing new features. Unit = 1, time spent refactoring over time

So bitrot would be

             d(env)     d(users)     d(maint)        d(t)
bitrot(t) = -------- * ---------- * ---------- * ----------------
              d(t)        d(t)        d(t)        d(refactoring)

             d(env) * d(users) * d(maint)
          = ------------------------------
                d(t)² * d(refactoring)

with a combined unit of OPS/$/s * FP/s * FP/s = (OPS*FP²) / ($*s³).

This is of course only a very forced pseudo-mathematical notation of what the Wikipedia article already said: bitrot arises from changes in the environment, changes in the users' requirements and changes to the code, while it is mitigated by spending time on refactoring. Every organisation will have to decide for itself how to measure those changes, I only give very general bounds.

like image 119
David Schmitt Avatar answered Oct 31 '22 16:10

David Schmitt


I disagree with Charlie: minor refactoring of source code can result in very large Hamming distances, and doesn't provide a good measure of the degree to which the code has been logically modified.

I would consider looking at the length of commit comments. For a given programmer, a relatively long commit comment usually indicates that they've made a significant change to the code.

like image 38
splicer Avatar answered Oct 31 '22 15:10

splicer


How about the simplest possible answer?

foreach (file in source control){
  file.RotLevel = (Time.Now - file.LastTestedOrDeployed)
}

If a file hasn't been deployed (either to production or to a test machine) for a long time, it may be out of sync with "reality". The environment may have changed, and even if the file has not been changed, it may no longer work. So that seems to me to be a simple and accurate formula. Why make it more complex than that? Involving number of changes seems to add only uncertainty. If a file has been modified recently, does that mean it has been updated to reflect a change in the environment (which makes it "less rotten"), or have new features been added (increasing the risk of errors, and so making it "more rotten")? Modifications to a file could mean anything.

The only unambiguous factor I can think of is "how long as it been since we last verified that the file worked?"

like image 2
jalf Avatar answered Oct 31 '22 14:10

jalf