Name mangling in C++ with Clang and GCC

I recently came across the question whether it is possible to use lists of mangled function names (generated with a Clang-based tool) in a GCC compiler plugin. I am aware that name mangling is compiler dependent and not standardized, yet I had hopes that this would be something I can achieve.

I started with a quick web search. That, however, did not lead to satisfying answers, as I was still unsure whether it could actually work. Most of the answers I found were about the status when GCC 5 came out. Now, I am working with GCC 8 and things may change.

So, I continued by implementing very basic test cases to start this off experimentally, i.e., to get an idea on whether more reading about all this is worth my time. Codes were as simple as the one shown below. The first name in the comment is the GCC mangled name (g++ 8.3) and the second name is the Clang mangled name (clang++ 9.0).

void foo() {} // _Z3foov == _Z3foov
void foo(int a){} // _Z3fooi == _Z3fooi
double foo(double d){return 0;} // _Z3food == _Z3food
void foo(int a, double d) {} // _Z3fooid == _Z3fooid
namespace test {
 void foo(int a) {} // _ZN4test3fooEi == _ZN4test3fooEi
 double foo(double d) {return 0;} //_ZN4test3fooEd == _ZN4test3fooEd
}

So, at least given this small set of samples, there do not seem to be differences. I did similar tiny-scale experiments for classes and templates. All of them were simple enough to not discover differences. Eventually, I applied both compilers to a basic C++ implementation of the game of life (sources) and filtered the object code to get a list of all the function names in the resulting binary. I compiled at optimization level 0 to let the compiler not do inlining or other optimizations. I’m sure listing all functions in an object can be done much easier (e.g., using nm), but this is what I did (accordingly for a version of the code compiled with g++):

objdump -d clang++90.GoL | grep ">:" | awk '{ print $2 }' | sed -e "s/://" | uniq | sort > clang++90_names

Inspecting both lists of generated function names, I found differences. In particular, in the mangled name of the constructor of the GameOfLife class.

class GameOfLife {
  public:
    GameOfLife(int numX, int numY) : dimX(numX), 
                                     dimY(numY), 
                                     gridA(dimXdimY,
                                     gridB(dimXdimY){}
    // other members are omitted
};

The constructor is mangled into _ZN10GameOfLifeC1Eii by GCC and into _ZN10GameOfLifeC2Eii by Clang. The difference is the C1 vs. the C2 in the name.

Now, I wondered: what is encoded by these C1 / C2 parts of the mangled name? I know that Clang mangles the names according to the Itanium IA64 ABI specification. A quick web search lead me here and so I searched for the respective section of the specification. I found that the specification lists the following in 5.1.4.3 Constructors and Destructors.

  <ctor-dtor-name> ::= C1	# complete object constructor
		   ::= C2	# base object constructor
		   ::= C3	# complete object allocating constructor
		   ::= D0	# deleting destructor
		   ::= D1	# complete object destructor
		   ::= D2	# base object destructor

So, GCC treats the constructor of the GameOfLife class as a complete object constructor, whereas Clang treats it as a base object constructor.

At that point I did not continue digging deeper on why that is the case, i.e., thoroughly reading the IA64 ABI specification definitions, as for me it is sufficient to know that the differences in name mangling occur at such fundamental features as constructors. However, maybe, if someone (or a future me) has the same question (again), I thought I share this in order to know where to start looking for more detail.

Finally, the overall result of this small research is that I will need to write an LLVM plugin to mimic the functionality of the respective GCC plugin I wanted to use in my toolchain. Nothing too bad, but I would have been happier if I could just use the already available GCC plugin.

papi-wrap now public

I took some time on my last day of vacation to finish the refactorings I wanted to do on the PAPI wrapper that I mentioned in a previous post. Although I am sure that there is lots of things to clean up in this rather small code base, I made it publicly available!

It was used to generate the measurement results in my paper about the influence of measurement infrastructures, available in the ACM digital library.

The library was intended as an easy-to-use PAPI interface for C++ codes. It can be used as a library to be integrated in your code or it can be used as an external measurement routine using libmonitor.  I may continue to work on this library in my free time as I do have some more ideas and want to integrate two features. One, implement a more structured way to output the measurement results. Two, have it not only count PAPI events, but also have it provide simple timer mechanisms.

If you are interested in this project, you can go to the papi-wrap on my github and download the source, build it and play around with it.