MetaCG – Annotated Whole-Program Call-Graphs

MetaCG: A tool suite for whole-program inspection and automatic performance instrumentation generation. It brings a call-graph extractor using Clang Tooling, a whole-program call-graph library that is serializable to a json-based format and allows annotation with user-defined meta data, hence the name MetaCG. Finally, a call-graph validation tool that uses a Score-P profile to determine which edges are missing.

Repository: https://github.com/tudasc/metacg

In our work on PIRA, we realized that we need a whole-program call-graph representation that we can analyze and annotate with user-defined information. There are obviously multiple ways to do that, and we decided (more or less well-informed) to implement it as a library together with a toolchain to extract the call graph from C/C++ code using Clang Tooling. To evaluate its completeness we figured that it is easiest for us to use instrumentation-based profiling data using Score-P. This is a dependence of PIRA anyway, and at that time the call-graph library was only used within PIRA. Later on, we realized that we want to use whole-program reachability information in other tools as well, and think that the call-graph library of PIRA is a reasonable abstraction.

So we started MetaCG as a more general software package.

The software package is written in C++, uses the CMake build system and is licensed under a BSD 3-clause license. It comes with five software components.

MetaCG Library: The fundamental call-graph representation. A lightweight, bi-directional graph of which the function nodes can hold user-defined information in MetaContainers. The graph can be serialized into json, in which case the MetaContainers are output to every function node with a specified key such that they can be identified in the json file later on, e.g., by a subsequent analysis tool. Currently, it does not contain explicitly modeled edges, which limits its expressiveness to some extent. However, this is a feature that is planned and will be added when time permits.

CGCollector: The Clang-based call-graph extractor. It processes the abstract syntax tree and obtains information about the class hierarchy, call relations, and other source-level information that a user needs. The latter is done through the MetaCollector extension point, i.e., for every source information that should be annotated, a new MetaCollector is derived, obtains the desired information, and attaches it for a specific tool to the MetaCG.

CGMerge: CGCollector works on a single translation unit at a time, hence, the partial call graphs need to be merged. This is done, similar to linking for a binary, with CGMerge. It takes all translation-unit local files and merges them. It needs some strategy to resolve potential multiple entries in the meta data, i.e., data generated from a MetaCollector, fields, hence, the user is required to provide them.

CGValidate: The tool gets a MetaCG and a Score-P profile in Cube format (please note, it needs to be a full profile, i.e., with all functions marked as inline etc), and checks which edges are not present in the MetaCG. This allows a user to validate that all potential function calls are contained, and if not, CGValidate can patch the missing edges into the MetaCG.

PGIS: The PIRA analyzer that performs call-graph analysis to generate low-overhead performance instrumentation for subsequent measurement with Score-P.

If you are curious, please check it out, and report issues and bugs in the issue tracker on Github. I also plan to write more articles here that explain some components or use-cases in more detail.

The development currently takes place in a university-hosted Gitlab instance, hence, not every feature that is being worked on is already public. Should you be interested in our progress or even in contributing to the project, please also open an issue on Github and we can figure out how you get access.