Does bitrot have any accepted dimensions?

Tags:

Every modern source control system can slice and dice the history of a program. There are many tools to statically and dynamically analyze code. What sort of mathematical formula would allow me to integrate the amount of activity in a file along with the number of deployments of that software? We are finding that even if a program completes all of its unit tests, it requires more work than we would expect at upgrade time. A measure of this type should be possible, but sitting down and thinking about even its units has me stumped.

Update: If something gets sent to a test machine I could see marking it less rotten. If something gets sent to all test boxes I could see it getting a fresh marker. If something goes to production I could give it a nod and reduce its bitrot score. If there is a lot of activity within its files and it never gets sent anywhere I would ding the crap out of it. Don't focus on the code assume that any data I need is at hand.

What kind of commit analysis (commit comments (mentioned below) or time between commits) is fair data to apply?

Update: I think dimensional analysis could probably just be based on age. Relative to that is a little more difficult. Old code is rotten. The average age of each line of code still is simply a measure of time. Does a larger source module rot faster than a smaller, more complex one?

Update Code coverage is measured in lines. Code executed often must by definition be less rotten than code never executed. To accurately measure bitrot you would need coverage analysis to act as a damper.

239

asked Apr 03 '09 21:04

ojblass

3 Answers

Very interesting train of thought!

First, what is bitrot? The Software Rot article on wikipedia collects a few points:

Environment change: changes in the runtime
Unused code: changes in the usage patterns
Rarely updated code: changes through maintenance
Refactoring: a way to stem bitrot

By Moore's Law, delta(CPU)/delta(t) is a constant factor two every 18 to 24 months. Since the environment contains more than the CPU, I would assume that this forms only a very weak lower bound on actual change in the environment. Unit: OPS/$/s, change in Operations Per Second per dollar over time

delta(users)/delta(t) is harder to quantify, but evidence in the frequency of occurrences of the words "Age of Knowledge" in the news, I'd say that users' expectations grow exponentially too. By looking at the development of $/flops basic economy tells us that supply is growing faster than demand, giving Moore's Law as upper bound of user change. I'll use function points ("amount of business functionality an information system provides to a user") as a measure of requirements. Unit: FP/s, change in required Function Points over time

delta(maintenance)/delta(t) depends totally on the organisation and is usually quite high immediately before a release, when quick fixes are pushed through and when integrating big changes. Changes to various measures like SLOC, Cyclomatic Complexity or implemented function points over time can be used as a stand-in here. Another possibility would be bug-churn in the ticketing system, if available. I'll stay with implemented function points over time. Unit = FP/s, change in implemented Function Points over time

delta(refactoring)/delta(t) can be measured as time spent not implementing new features. Unit = 1, time spent refactoring over time

So bitrot would be

             d(env)     d(users)     d(maint)        d(t)
bitrot(t) = -------- * ---------- * ---------- * ----------------
              d(t)        d(t)        d(t)        d(refactoring)

             d(env) * d(users) * d(maint)
          = ------------------------------
                d(t)² * d(refactoring)

with a combined unit of OPS/$/s * FP/s * FP/s = (OPS*FP²) / ($*s³).

This is of course only a very forced pseudo-mathematical notation of what the Wikipedia article already said: bitrot arises from changes in the environment, changes in the users' requirements and changes to the code, while it is mitigated by spending time on refactoring. Every organisation will have to decide for itself how to measure those changes, I only give very general bounds.

119

answered Oct 31 '22 16:10

David Schmitt

I disagree with Charlie: minor refactoring of source code can result in very large Hamming distances, and doesn't provide a good measure of the degree to which the code has been logically modified.

I would consider looking at the length of commit comments. For a given programmer, a relatively long commit comment usually indicates that they've made a significant change to the code.

answered Oct 31 '22 15:10

splicer

How about the simplest possible answer?

foreach (file in source control){
  file.RotLevel = (Time.Now - file.LastTestedOrDeployed)
}

If a file hasn't been deployed (either to production or to a test machine) for a long time, it may be out of sync with "reality". The environment may have changed, and even if the file has not been changed, it may no longer work. So that seems to me to be a simple and accurate formula. Why make it more complex than that? Involving number of changes seems to add only uncertainty. If a file has been modified recently, does that mean it has been updated to reflect a change in the environment (which makes it "less rotten"), or have new features been added (increasing the risk of errors, and so making it "more rotten")? Modifications to a file could mean anything.

The only unambiguous factor I can think of is "how long as it been since we last verified that the file worked?"

answered Oct 31 '22 14:10

jalf

Related questions
                            
                                Source Control on the IBM i (iSeries) [closed]
                            
                                How to manage external dependencies which are constantly being modified
                            
                                Perforce "Translation of file content failed" error
                            
                                In subversion, how can I un-hijack a file?
                            
                                Version control system for huge files?
                            
                                How does one manage multiple release branches in subversion?
                            
                                Should I keep binary assets under TFS? How?
                            
                                Team Foundation Server branching characteristics, compared to others
                            
                                Branching from a Tag
                            
                                How to setup SVN repo for emergency fixes?
                            
                                converting git branch to git tag
                            
                                How can I use git to track SRPM customizations?
                            
                                How do I undo a git svn dcommit?
                            
                                Plotting arbitrary data for repository
                            
                                Website Project with Team Foundation Server
                            
                                If I Resolve a Conflict Does it Change Git Blame?
                            
                                SVN commit vs Git commit
                            
                                git: 'credential-osxkeychain' is not a git command. See 'git --help'
                            
                                How can I manage my projects versions using Visual Studio 2012 Professional
                            
                                No source control option in SQL Server 2017 Management Studio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does bitrot have any accepted dimensions?

Tags:

version-control

metrics