Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to answer basic questions such as who calls what in a very large C++ code base that is frequently changing. But, I also need to have some automated way to exactly identify what the code is doing around a particular area of code. "grep" tools such as Cscope are useful (and I use them heavily already), but are not C++-language-aware: They don't give any way to identify the types and kinds of lexical environment of a given use of a type or function a such way that is conducive to automation (even if said automation is limited to "read-only" operations such as code browsing and navigation, but I'm asking for much more than that below).

Question: Does there exist already an open-source C/C++-based library (native, not managed, not Microsoft- or Linux-specific) that can statically scan or analyze a large tree of C++ code, and can produce result sets that answer detailed questions such as:

  • What functions are called by some supplied function?
  • What functions make use of this supplied type?
  • Ditto the above questions if C++ classes or class templates are involved.

The result set should provide some sort of "handle". I should be able to feed that handle back to the library to perform the following types of introspection:

  • What is the byte offset into the file where the reference was made?
  • What is the reference into the abstract syntax tree (AST) of that reference, so that I can inspect surrounding code constructs? And each AST entity would also have file path, byte-offset, and type-info data associated with it, so that I could recursively walk up the graph of callers or referrers to do useful operations.

The answer should meet the following requirements:

  • API: The API exposed must be one of the following:
    • C or C++ and probably is "C handle" or C++-class-instance-based (and if it is, must be generic C o C++ code and not Microsoft- or Linux-specific code constructs unless it is to meet specifics of the given platform), or
    • Command-line standard input and standard output based.
  • C++ aware: Is not limited to C code, but understands C++ language constructs in minute detail including awareness of inter-class inheritance relationships and C++ templates.
  • Fast: Should scan large code bases significantly faster than compiling the entire code base from scratch. This probably needs to be relaxed, but only if Incremental result retrieval and Resilient to small code changes requirements are fully met below.
  • Provide Result counts: I should be able to ask "How many results would you provide to some request (and no don't send me all of the results)?" that responds on the order of less than 3 seconds versus having to retrieve all results for any given question. If it takes too long to get that answer, then wastes development time. This is coupled with the next requirement.
  • Incremental result retrieval: I should be able to then ask "Give me just the next N results of this request", and then a handle to the result set so that I can ask the question repeatedly, thus incrementally pulling out the results in stages. This means I should not have to wait for the entire result set before seeing some subset of all of the results. And that I can cancel the operation safely if I have seen enough results. Reason: I need to answer the question: "What is the build or development impact of changing some particular function signature?"
  • Resilient to small code changes: If I change a header or source file, I should not have to wait for the entire code base to be rescanned, but only that header or source file rescanned. Rescanning should be quick. E.g., don't do what cscope requires you to do, which is to rescan the entire code base for small changes. It is understood that if you change a header, then scanning can take longer since other files that include that header would have to be rescanned.
  • IDE Agnostic: Is text editor agnostic (don't make me use a specific text editor; I've made my choice already, thank you!)
  • Platform Agnostic: Is platform-agnostic (don't make me only use it on Linux or only on Windows, as I have to use both of those platforms in my daily grind, but I need the tool to be useful on both as I have code sandboxes on both platforms).
  • Non-binary: Should not cost me anything other than time to download and compile the library and all of its dependencies.
  • Not trial-ware.
  • Actively Supported: It is likely that sending help requests to mailing lists or associated forums is likely to get a response in less than 2 days.
  • Network agnostic: Databases the library builds should be able to be used directly on a network from 32-bit and 64-bit systems, both Linux and Windows interchangeably, at the same time, and do not embed hardcoded paths to filesystems that would otherwise "root" the database to a particular network.
  • Build environment agnostic: Does not require intimate knowledge of my build environment, with the notable exception of possibly requiring knowledge of compiler supplied CPP macro definitions (e.g. -Dmacro=value).
like image 716
bgoodr Avatar asked Feb 03 '11 17:02

bgoodr


2 Answers

I would say that CLang Index is a close fit. However I don't think that it stores data in a database.

Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!

like image 132
Matthieu M. Avatar answered Sep 22 '22 16:09

Matthieu M.


I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.

like image 31
Max Lybbert Avatar answered Sep 21 '22 16:09

Max Lybbert