Cleanup huge Perl Codebase

Question

I am currently working on a roughly 15 years old web application.

It contains mainly CGI perl scripts with HTML::Template templates.

It has over 12 000 files and roughly 260 MB of total code. I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code.

There are practically no tests written for the code.

My questions are:

Are you aware of any CPAN module that can help me get a list of only used and required modules?
What would be your approach if you'd want to get rid of all the extra code?

I was thinking at the following approaches:

try to override the use and require perl builtins with ones that output the loaded file name in a specific location
override the warnings and/or strict modules import function and output the file name in the specific location
study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests
replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet)
some creative use of lsof (?!?)

Ven'Tatsu · Accepted Answer

Devel::Modlist may give you what you need, but I have never used it.

The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program.

END {
    open my $log_fh, ...;
    print $log_fh "$_
" for sort keys %INC;
}

Barton Chittenden · Answer

As a first approximation, I would simply run

egrep -r '\<(use|require)\>' /path/to/source/*

Then spend a couple of days cleaning up the output from that. That will give you a list of all of the modules used or required.

You might also be able to play around with @INC to exclude certain library paths.

If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (i.e. 't' in the debugger) turned on, then redirect the output to a text file for further analysis. I know that this is difficult when running CGI...

Richard Huxton · Answer

Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used.

Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening.

Cleanup huge Perl Codebase

Tags:

web-applications

cgi

code-cleanup

perl

Tudor Constantin

3 Answers

Ven'Tatsu

Barton Chittenden

Richard Huxton

Recent Activity

Donate For Us

Cleanup huge Perl Codebase

Tags:

web-applications

cgi

code-cleanup

perl

Tudor Constantin

3 Answers

Ven'Tatsu

Barton Chittenden

Richard Huxton

Related questions

Recent Activity

Donate For Us