Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically detect identical consecutive std::string::find() calls

During a code review, i found source code like this:

void f_odd(std::string &className, std::string &testName)
{
   if (className.find("::") != std::string::npos)
   {
     testName = className.substr(className.find("::") + 2);
     (void)className.erase(className.find("::"), std::string::npos);
   }
}

Within this function, std::string::find() is called three times with the same pattern( here "::").

This code can of course be refactored to

void f(std::string &className, std::string &testName)
{
   const size_t endOfClassNamePos = className.find("::");
   if (endOfClassNamePos != std::string::npos)
   {
     testName = className.substr(endOfClassNamePos + 2);
     (void)className.erase(endOfClassNamePos, std::string::npos);
   }
}

where find is called only once.

Question

Does anybody know a strategy to detecting such a pattern like this? I am having a huge code base, where i intend to spot this pattern. I plan to use a Windows or a Linux environment.

Potential Strategies

  1. Use/adapt a static code analysis tool, like cppcheck to detect these kind of oddities.
  2. Search within the code base with regular expression.
  3. Use/adapt clang-tidy for detection of this pattern.
  4. Write a custom checker in some language (e.g. Python) that detects these issues. In this case, the checking should be performed on pre-processed code.

No Go's

  • Manual review

Update 1

I decided to start with potential strategy 1). I plan to adapt cppcheck to catch this issue.

Cppcheck offers a possibility to write customized rules, based on PCRE regular expressions. For this, cppcheck has to be compiled with enabled PCRE support. Since the current test environment is Linux-based, the following commands can be used to download the latest version of cppcheck:

git clone https://github.com/danmar/cppcheck.git && cd cppcheck

After that, compile and install the tool as follows:

sudo make install HAVE_RULES=yes

Now the basic tool setup is done. In order to develop a cppcheck-rule, i prepared a simple test case (file: test.cpp), similar to the sample code in the first section of this article. This file contains three functions and the cppcheck-rule shall emit a warning on f_odd and f_odd1 about consecutive identical std::string::find calls.

test.cpp:

#include <string>
void f(std::string &className, std::string &testName)
{
   const size_t endOfClassNamePos = className.find("::");
   if (endOfClassNamePos != std::string::npos)
   {
     testName = className.substr(endOfClassNamePos + 2);
     (void)className.erase(endOfClassNamePos, std::string::npos);
   }
 }

 void f_odd(std::string &className, std::string &testName)
 {
    if (className.find("::") != std::string::npos)
    {
       testName = className.substr(className.find("::") + 2);
       (void)className.erase(className.find("::"), std::string::npos);
    }
 }

 #define A "::"
 #define B "::"
 #define C "::"
 void f_odd1(std::string &className, std::string &testName)
 {
   if (className.find(A) != std::string::npos)
   {
      testName = className.substr(className.find(B) + 2);
      (void)className.erase(className.find(C), std::string::npos);
   }
 }

So far so good. Now cppcheck has to be tweaked to catch consecutive identical std::string::find calls. For this i have created a cppcheck_rule-file that contains a regular expression that matches consecutive identical std::string::find calls:

<?xml version="1.0"?>
<rule>
<tokenlist>normal</tokenlist>
<pattern><![CDATA[([a-zA-Z][a-zA-Z0-9]*)(\s*\.\s*find)(\s*\(\s*\"[ -~]*\"\s*\))[ -\{\n]*(\1\2\3)+[ -z\n]]]></pattern>
<message>
    <severity>style</severity>
    <summary>Found identical consecutive std::string::find calls.</summary>
</message>

This file can be used to extend cppcheck about a new check. Lets try:

cppcheck --rule-file=rules/rule.xml test/test.cpp

and the output is

Checking test/test.cpp...
[test/test.cpp:14]: (style) Found identical consecutive std::string::find calls.
[test/test.cpp:26]: (style) Found identical consecutive std::string::find calls.

Now, identical consecutive std::string::find calls can be detected in C/C++ codes. Does anybody know a better/more efficient or more clever solution?

References:

  • Checkout the cppcheck_rule file on github
  • How to write custom rules for cppcheck

like image 644
orbitcowboy Avatar asked Apr 27 '16 12:04

orbitcowboy


Video Answer


1 Answers

The main problem with such a tool is that a lexical analysis can only check if there is textual repetition. Taking your example, calling className.find("::") twice is a potential issue if the variable refers to the same string twice. But let me add one small change to your code: className = className.substr(className.find("::") + 2);. Suddenly the meaning of the next className.find has changed dramatically.

Can you find such changes? You need a full-blown compiler for that, and even then you have to be pessimistic. Sticking to your example, could className be changed via an iterator? It's not just direct manipulation you need to be aware of.

Is there no positive news? Well: existing compilers do have a similar mechanism. It's called Common Subexpression Elimination, and it works conceptually as you would want it to work in the example above. But that is also bad news in one way: If the situation is detectable, it's unimportant because it's already optimized out by the compiler!

like image 110
MSalters Avatar answered Sep 19 '22 20:09

MSalters