I have to work with a project that has tons of commented code everywhere. Before I introduce any changes I would like to do a basic clean-up and remove old unused code.
So I could just use solution from this accepted answer to remove all comments, but...
There are legitimate comments (not a commented code) that explain stuff. I don't want to remove it. For example:
// Those parameters control foo and bar... <- valid comment
int t = 5;
// int t = 10; <- commented code
int k = 2*t;
Only line 3 should be removed.
What are the possible ways of analyzing the code and distinguish between comments in natural language and commented lines of code?
This is a basic approach, but it proposes a proof of concept of what might be done. I do it using Bash along with the usage of the GCC -fsyntax-only option.
Here is the bash script:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
LINE=`echo $line | grep -oP "(?<=//).*"`
if [[ -n "$LINE" ]]; then
echo $LINE | gcc -fsyntax-only -xc -
if [[ $? -eq 0 ]]; then
sed -i "/$LINE/d" ./$1
fi
fi
done < "$1"
The approach I followed here was reading each line from the code file. Then, greping the text after the // delimiter (if exists) with the regex (?<=//).* and passing that to the gcc -fsyntax-only command to check whether it's a correct C/C++ statement or not. Notice that I've used the argument -xc - to pass the input to GCC from stdin (see my answer here to understand more). An important note, the c in -xc - specifies the language, which is C in this case, if you want it to be C++ you shall change it to -xc++.
Then, if GCC was able to successfully parse the statement (i.e., it's a legitimate C/C++ statement), I directly remove it using sed -i from the file passed.
Running it on your example (but after removing <- commented code from the third line to make it a legitimate statement):
// Those parameters control foo and bar... <- valid comment
int t = 5;
// int t = 10;
int k = 2*t;
Output (in the same file):
// Those parameters control foo and bar... <- valid comment
int t = 5;
int k = 2*t;
(if you want to add your modifications in a different file, just remove the -i from sed -i)
The script can be called just like: ./script.sh file.cpp, it may show several GCC errors since these are the valid comments.
A more simplified version of the same logic is:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ "$line" =~ [/]+.* ]]; then
$LINE=${line##*\/}
echo ${$LINE} | gcc -fsyntax-only -xc - && sed -i "/$LINE/d" ./$1
fi
done < "$1"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With