September 2022 – JP Lehr

After some history of the MetaCG project, let’s look a little more into technical things. First up is our branching model: What we use, and why we use it. As a general note, we use a private Gitlab instance for our development work, but have recently thought again about moving everything to GitHub. We have not finally decided what would be better for the project and how much work is needed to port our current CI to GitHub actions.

But, let’s get into the branching model for now.

Branching Model: Gitflow

We use the Gitflow branching model for the development of MetaCG. At least basically, and with a few “modifications”, compared to the original post linked here. There are some general considerations for our project: (1) It must be possible to develop multiple ideas and features for papers while working on student theses simultaneously, (2) we want to keep development private (general paranoia in academia for steeling ideas, no matter how relevant this actually is), (3) we want to maintain a public mirror so interested people can get access to our project, (4) the project should be traceable in its development to link software used for papers, and (5) connecting implementation with issues in our issue tracker to know of bugs and resolve limitations.

Given these considerations, it seemed to me that Gitflow is a good starting point. In Gitflow one maintains multiple branches for features, the development work, and the current version of the project. In our case, we maintain two main branches: devel and master and however-many feature branches we need. Our master branch is automatically mirrored from the private development instance to our public repository at github.com/tudasc/MetaCG. You can think of it as a release branch.

The other main branch is the devel branch. It is the branch to move the project forward and integrate features into. These features are developed on feature branches, that I’ll talk about a little in the next paragraphs. Let me just say that we do not develop hotfixes as part of our development model. We fix issues in devel and, if ‘critical’, push to master, i.e., release a new version. However, since our software is mainly focused on research, we do not see that any of our fixes are actually ‘critical’.

Finally, let’s talk about feature branches. Our feature branches are used for two things: feature development and theses projects. This differentiation may be the biggest difference in the intention of the use for feature branches in the Gitflow model. Yet, it is surely motivated from the academic setting that MetaCG is developed in.

Feature development: This is what one would expect from a feature branch. The feature is implemented while paying attention to software quality, usability and similar things. This means that the goal of a feature is to make the software more usable and more “a product”. This is different from theses projects.

Theses Projects: At universities, one of the important things is to allow students to work on projects for their Bachelor and Master degree. In my case, these projects are typically tied around MetaCG and/or PIRA. The thesis goal is then briefly explained in our issue tracker and a feature branch for this issue is created. Theses projects are typically not meant to be user friendly, but to demonstrate a particular idea and approach. One example could be the development of PIRA’s automatic load-imbalance detection for MPI applications. Once the theses is submitted and presented, these projects are, subsequently, treated like features, and shall be cleaned up and improved for usability before being re-integrated into the project. This is, however, no longer part of the thesis, but is usually done as a student research assistant.

All feature branches follow the same naming scheme: feat/<ISSUE-ID> so that it is easy for a project contributor to know what the newly introduced code is meant to do and address and the final merge request can clearly communicate what the code does.

Finally, and I’ll talk about that more in a later post, the preparation of new releases is also done on a feature branch / multiple feature branches. These feature branches, however, have a distinguishable name.

If you are curious about the decisions, think that there are better ways to do it, or would like to contribute, reach out to me! Potentially shoot me a message or tag me on twitter.

This is the first post of a mini series that I thought may be interesting to do. It will go a little into the details of our development model for the MetaCG library and their reasons. The things we do here are obviously by no means “the” right thing to do for every project, but maybe the perspective on the different topics is relevant for you and your decision making for your own project. The series will consist of three parts.

Branching Model: In the first part we provide an overview of our branching model and why we chose it.
Development and Testing: The second part will go over our development work. This is probably the largest part of the mini series and may be split up into multiple parts itself.
Release Management: In the third part we provide an overview of our release management, i.e., which releases we do, how they are prepared and how they are finalized.

However, let’s first have some history on MetaCG as this will be the reason for some of the decisions that were and are taken.

MetaCG — History

MetaCG started (probably late 2013) as a nameless mock-tool for smart instrumentation tools and heuristics within our work on the InstRO tool. It was used to construct the call graph for a program given an unfiltered Score-P profile. An unfiltered Score-P profile is a runtime profile recorded with the Score-P tool that contains all call edges from the original source code executed at runtime. Given this call graph, the tool would then evaluate the heuristics that were published in the paper “Calltree-Controlled Instrumentation for Low-Overhead Survey Measurements”.

Then, during my PhD time, I evolved the code base of the nameless mock-tool starting in 2017 into what was then referred to as MetaCG. The code base was, however, still almost exclusively focused on the use within the PIRA tool and our work on iterative performance instrumentation refinement that we published in our paper on PIRA.

After our paper on “Automatic Instrumentation Refinement for Empirical Performance Modeling”, we decided to evolve MetaCG into its own tool that can be used more easily outside of the PIRA context. The idea behind this was to ease our research tools development should we want or need to pass information from the source level to the LLVM level. We wrote a paper about MetaCG and used the now-available whole-program call-graph information for an extended analysis in our paper “Towards compiler-aided correctness checking of adjoint MPI applications”.

Since then we have worked on the MetaCG code base to separate it more and more from the initial use case within PIRA. This means that it is now evolving into a set of libraries and tools that build on top of these libraries. One tool is the (also evolving) analyzer used within PIRA and a more recently created tool is the analyzer built as part of the CaPI project.

This evolution of the MetaCG code base has lead to several significant changes and re-designs. Many of these changes were really motivated and necessary while we evolve from a special-purpose mock-up tool, to a prototype tool for one particular context, to a set of libraries for tool construction. What I learned in the process may also make it into an article on this website eventually.

As one important side note, coming from academia, the possibility for (a) working on paper ideas and (b) students to work on parts of MetaCG for a thesis was and is very important. This can be seen in the branching model and other areas of the tool and library throughout the mini-series.

Month: September 2022

MetaCG — Branching Model

Branching Model: Gitflow

MetaCG Development Details Mini Series

MetaCG — History