Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleanup huge Perl Codebase

I am currently working on a roughly 15 years old web application.

It contains mainly CGI perl scripts with HTML::Template templates.

It has over 12 000 files and roughly 260 MB of total code. I estimate that no more than 1500 perl scripts are needed and I want to get rid of all the unused code.

There are practically no tests written for the code.

My questions are:

  • Are you aware of any CPAN module that can help me get a list of only used and required modules?
  • What would be your approach if you'd want to get rid of all the extra code?

I was thinking at the following approaches:

  • try to override the use and require perl builtins with ones that output the loaded file name in a specific location
  • override the warnings and/or strict modules import function and output the file name in the specific location
  • study the Devel::Cover perl module and take the same approach and analyze the code when doing manual testing instead of automated tests
  • replace the perl executable with a custom one, which will log each name of file it reads (I don't know how to do that yet)
  • some creative use of lsof (?!?)
like image 308
Tudor Constantin Avatar asked May 25 '12 13:05

Tudor Constantin


3 Answers

Devel::Modlist may give you what you need, but I have never used it.

The few times I have needed to do somehing like this I have opted for the more brute force approach of inspecting %INC at the end the program.

END {
    open my $log_fh, ...;
    print $log_fh "$_\n" for sort keys %INC;
}
like image 120
Ven'Tatsu Avatar answered Oct 18 '22 09:10

Ven'Tatsu


As a first approximation, I would simply run

egrep -r '\<(use|require)\>' /path/to/source/*

Then spend a couple of days cleaning up the output from that. That will give you a list of all of the modules used or required.

You might also be able to play around with @INC to exclude certain library paths.

If you're trying to determine execution path, you might be able to run the code through the debugger with 'trace' (i.e. 't' in the debugger) turned on, then redirect the output to a text file for further analysis. I know that this is difficult when running CGI...

like image 43
Barton Chittenden Avatar answered Oct 18 '22 09:10

Barton Chittenden


Assuming the relevant timestamps are turned on, you could check access times on the various script files - that should rule out any top-level script files that aren't being used.

Might be worth adding some instrumentation to CGI.pm to log the current script-name ($0) to see what's happening.

like image 2
Richard Huxton Avatar answered Oct 18 '22 09:10

Richard Huxton