Name mangling in C++ with Clang and GCC

I recently came across the question whether it is possible to use lists of mangled function names (generated with a Clang-based tool) in a GCC compiler plugin. I am aware that name mangling is compiler dependent and not standardized, yet I had hopes that this would be something I can achieve.

I started with a quick web search. That, however, did not lead to satisfying answers, as I was still unsure whether it could actually work. Most of the answers I found were about the status when GCC 5 came out. Now, I am working with GCC 8 and things may change.

So, I continued by implementing very basic test cases to start this off experimentally, i.e., to get an idea on whether more reading about all this is worth my time. Codes were as simple as the one shown below. The first name in the comment is the GCC mangled name (g++ 8.3) and the second name is the Clang mangled name (clang++ 9.0).

void foo() {} // _Z3foov == _Z3foov
void foo(int a){} // _Z3fooi == _Z3fooi
double foo(double d){return 0;} // _Z3food == _Z3food
void foo(int a, double d) {} // _Z3fooid == _Z3fooid
namespace test {
 void foo(int a) {} // _ZN4test3fooEi == _ZN4test3fooEi
 double foo(double d) {return 0;} //_ZN4test3fooEd == _ZN4test3fooEd
}

So, at least given this small set of samples, there do not seem to be differences. I did similar tiny-scale experiments for classes and templates. All of them were simple enough to not discover differences. Eventually, I applied both compilers to a basic C++ implementation of the game of life (sources) and filtered the object code to get a list of all the function names in the resulting binary. I compiled at optimization level 0 to let the compiler not do inlining or other optimizations. I’m sure listing all functions in an object can be done much easier (e.g., using nm), but this is what I did (accordingly for a version of the code compiled with g++):

objdump -d clang++90.GoL | grep ">:" | awk '{ print $2 }' | sed -e "s/://" | uniq | sort > clang++90_names

Inspecting both lists of generated function names, I found differences. In particular, in the mangled name of the constructor of the GameOfLife class.

class GameOfLife {
  public:
    GameOfLife(int numX, int numY) : dimX(numX), 
                                     dimY(numY), 
                                     gridA(dimXdimY,
                                     gridB(dimXdimY){}
    // other members are omitted
};

The constructor is mangled into _ZN10GameOfLifeC1Eii by GCC and into _ZN10GameOfLifeC2Eii by Clang. The difference is the C1 vs. the C2 in the name.

Now, I wondered: what is encoded by these C1 / C2 parts of the mangled name? I know that Clang mangles the names according to the Itanium IA64 ABI specification. A quick web search lead me here and so I searched for the respective section of the specification. I found that the specification lists the following in 5.1.4.3 Constructors and Destructors.

  <ctor-dtor-name> ::= C1	# complete object constructor
		   ::= C2	# base object constructor
		   ::= C3	# complete object allocating constructor
		   ::= D0	# deleting destructor
		   ::= D1	# complete object destructor
		   ::= D2	# base object destructor

So, GCC treats the constructor of the GameOfLife class as a complete object constructor, whereas Clang treats it as a base object constructor.

At that point I did not continue digging deeper on why that is the case, i.e., thoroughly reading the IA64 ABI specification definitions, as for me it is sufficient to know that the differences in name mangling occur at such fundamental features as constructors. However, maybe, if someone (or a future me) has the same question (again), I thought I share this in order to know where to start looking for more detail.

Finally, the overall result of this small research is that I will need to write an LLVM plugin to mimic the functionality of the respective GCC plugin I wanted to use in my toolchain. Nothing too bad, but I would have been happier if I could just use the already available GCC plugin.