Audio Setup Improvements

After almost a full year of working from home and numerous video conferences in different software solutions, I have finally upgraded my audio equipment. Initially I wanted to revive my old Tascam US122L, but it seems that all solutions I found to get that device working in Linux are outdated by now.

TL;DR
I bought a Focusrite Scarlett 2i2 3rd Gen USB audio interface*, a the t.bone condenser microphone, and a desk stand for the microphone. I am very satisfied with the overall setup quality and the sound it produces, although I would buy a different desk stand next time.

Longer Story

I have been looking into beginner’s home recording equipment for some time due to my general interest for it. Since, generally speaking, time is the most limiting factor, I never considered buying any of the equipment, as I thought I would never use it. With the pandemic changing the way I work and interact with my colleagues etc, these things have changed, and I finally had an excuse to spend some money.

I had two requirements that the audio interface needs to fulfill:
1. I want a two channel / stereo interface with somewhat decent pre-amps, and,
2. it needs to work on Linux (and potentially Windows).

After spending some time searching the internet, I finally came by the Scarlett series by Focusrite. It seems that their (smaller) interfaces are USB class compliant, meaning that they work with the Linux built-in USB stack – which I can confirm (Manjaro Linux, Kernel 5.11). The reviews for the pre-amps read decent, and it is not excessively expensive. So far, I can only agree and conclude that I like the sound, and knobs and casing look and feel very high quality.

As microphone, I went with more of a budget solution: the “the t.bone SC 400” condenser microphone. Though it is comparably affordable, I think it offers a decent and rich sound. I do feel, however, that it lacks some airiness in the high frequencies.

If you are interested in a little voice demonstration, the audio setup is the one I use for my twitch live streams. So don’t hesitate and join me for some demo of the sound on Wednesday night.

I’m now on twitch!

Yes that’s right! I’m now on twitch!

I have lately started to stream some programming and building games live on twitch.tv/jplehr and found that quiet relaxing and nice. So I want to continue it with a schedule. I’m happy if people join-in to chat, learn more about the software that I work on, maybe about programming in general, or whatever we want to chat about.

Wednesday from 09 PM CET: Programming

Every Wednesday from 09.00 pm CET to about midnight, I am continuing on one of the software packages that I introduce on this website. Currently, I am mostly working on MetaCG or PIRA. Both can also be found on Github.

Sunday from 09 PM CET: Open Stream

I’ll probably also stream on Sunday’s, from 09.00 pm CET to about midnight. But on Sundays, it’ll not only be programming. Other stuff I want and can stream, given the current computers I have, are OpenTTD or Cities Skylines. So some relaxing building and development games. That may change obviously. 😉

So, if you want to know what’s coming up, follow me on twitter or twitch and receive updates and notifications on what I’ll be doing.

MetaCG – Annotated Whole-Program Call-Graphs

MetaCG: A tool suite for whole-program inspection and automatic performance instrumentation generation. It brings a call-graph extractor using Clang Tooling, a whole-program call-graph library that is serializable to a json-based format and allows annotation with user-defined meta data, hence the name MetaCG. Finally, a call-graph validation tool that uses a Score-P profile to determine which edges are missing.

Repository: https://github.com/tudasc/metacg

In our work on PIRA, we realized that we need a whole-program call-graph representation that we can analyze and annotate with user-defined information. There are obviously multiple ways to do that, and we decided (more or less well-informed) to implement it as a library together with a toolchain to extract the call graph from C/C++ code using Clang Tooling. To evaluate its completeness we figured that it is easiest for us to use instrumentation-based profiling data using Score-P. This is a dependence of PIRA anyway, and at that time the call-graph library was only used within PIRA. Later on, we realized that we want to use whole-program reachability information in other tools as well, and think that the call-graph library of PIRA is a reasonable abstraction.

So we started MetaCG as a more general software package.

The software package is written in C++, uses the CMake build system and is licensed under a BSD 3-clause license. It comes with five software components.

MetaCG Library: The fundamental call-graph representation. A lightweight, bi-directional graph of which the function nodes can hold user-defined information in MetaContainers. The graph can be serialized into json, in which case the MetaContainers are output to every function node with a specified key such that they can be identified in the json file later on, e.g., by a subsequent analysis tool. Currently, it does not contain explicitly modeled edges, which limits its expressiveness to some extent. However, this is a feature that is planned and will be added when time permits.

CGCollector: The Clang-based call-graph extractor. It processes the abstract syntax tree and obtains information about the class hierarchy, call relations, and other source-level information that a user needs. The latter is done through the MetaCollector extension point, i.e., for every source information that should be annotated, a new MetaCollector is derived, obtains the desired information, and attaches it for a specific tool to the MetaCG.

CGMerge: CGCollector works on a single translation unit at a time, hence, the partial call graphs need to be merged. This is done, similar to linking for a binary, with CGMerge. It takes all translation-unit local files and merges them. It needs some strategy to resolve potential multiple entries in the meta data, i.e., data generated from a MetaCollector, fields, hence, the user is required to provide them.

CGValidate: The tool gets a MetaCG and a Score-P profile in Cube format (please note, it needs to be a full profile, i.e., with all functions marked as inline etc), and checks which edges are not present in the MetaCG. This allows a user to validate that all potential function calls are contained, and if not, CGValidate can patch the missing edges into the MetaCG.

PGIS: The PIRA analyzer that performs call-graph analysis to generate low-overhead performance instrumentation for subsequent measurement with Score-P.

If you are curious, please check it out, and report issues and bugs in the issue tracker on Github. I also plan to write more articles here that explain some components or use-cases in more detail.

The development currently takes place in a university-hosted Gitlab instance, hence, not every feature that is being worked on is already public. Should you be interested in our progress or even in contributing to the project, please also open an issue on Github and we can figure out how you get access.

Changes to this website

I have decided that I want to use this website not only for notes taking – which quite honestly never really happened – but write about what I do. While this includes some of the computer science stuff that is here already, it will open up the variety of things a little bit.

So: what to expect?
Well: I don’t know for sure, yet. But it will be closer to what you can find on my twitter.

Computer Science and High Performance Computing

This part, which was the only thing I considered for now, will still be here. Most of this is also relevant for my daytime job and my research.
Actually, I want to increase this to not only little notes that helped me from time to time but to what I do. Also, I figured it can be interesting to provide pointers to some other great work of people that I work or just some other tools that I find helpful and enjoy.

Hence, I will probably introduce a *software* section (or tag) that includes (research) software that I use, enjoy, develop, want to promote, etc.

Sports

I also enjoy doing sports quite a bit, and decided that I am going to share some of what I do on this website as well. This probably includes some links to training data as well as gear that I use. No false expectations here: I’m not nearly as much of an athlete as I would like to be. But I do some sports and want to share some of my experiences.

Other free-time activities

Finally, I’ll also add some other free-time activities. I have a couple of very, very easy woodwork things on my list that I may share little stories of. However, this will be (I guess) the smallest part of what I add to the website.

HPC Hallway

As a result of the Covid 19 / Coronavirus situation, an informal weekly meeting, known as the HPC huddle, established itself. It offers a room for discussion around HPC, AI, cloud, and other, related, topics.

To preserve the shared links, we decided to create a Slack. You can find the join link at www.hpc-hallway.org

During conferences, such as ISC or SC, the meeting frequency may increase, to offer more room for discussion and informed speculation.

Determine max. memory consumption of process

In a recent project, we needed to measure the increase in memory consumption for an application process. How to obtain “the right values” for this depends on the actual scenario and, apparently, is not straight forward in all cases.

Let me first describe the scenario a little more: We want to obtain measurements for both fully serial and MPI parallel applications. These applications are run in (1) an unchanged (vanilla), (2) an instrumented version (version 1) and (3) a version, which uses LD_PRELOAD to sneak-in another library that overloads MPI functions to do additional work (version 2).

More precisely what we want:

  • A way to obtain reliable measurements for the different configurations, as we are interested in the additional amount of memory we need in version 1 and version 2, when compared to vanilla.
  • The max memory consumption at runtime, not regarding potential /swap memory.
  • We are only interested running on a Linux operating system

Eventually, we used the rusage feature. The returned struct offers different fields related to memory. We found that for our use case, the correct value was to use the maximum resident set size (max RSS). This proved to be reliable and reasonable compared to manual calculations of the memory we assumed we require. An example code is given below.

#include <sys/resource.h>
#include <stdio.h>
#include "mpi.h"

/* Needs to be called at the end of the process */
int MPI_Finalize() {
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  // We assume only MPI root should output memory consumption
  if (rank == 0) {
    struct rusage r;
    getrusage(RUSAGE_SELF, &r);
    printf("MAX RSS: %ld\n", r.ru_maxrss);
  }
  // ... Call to PMPI for actual MPI_Finalize
}

PIRA – a framework for iterative instrumentation refinement

The main software project I was working on through the last weeks and months is PIRA – the Performance Instrumentation Refinement Automation framework. It is available at https://github.com/jplehr/pira. It is the first software I have set up and used continuous integration for. However, for some historic reason, all components are split up into several repositories and the release “process” used for the initial release is a mess.
(Hint: the currently available version doesn’t work, because I missed something when I released it.)

The next release, using a better release process, is scheduled for August 1st.

Anyway – What is PIRA?

The framework can assist performance analysts and computer scientists to discover performance characteristics of their, or someone else’s, C and C++ software using Score-P. PIRA uses a combination of static and dynamic analysis to iteratively adapt an instrumentation configuration, i.e., which functions should be instrumented for measurement or analysis.

The main driver is written in Python 3. The analysis and instrumentation components are separated into an analysis tool and metric collectors built on top of Clang/LLVM. The final measurements are performed using the Score-P measurement infrastructure.

For those interested, there are two research papers available: (i) about the framework and (ii) a use case, in which we used PIRA to automatically reduce the number of functions passed to the empirical performance modeling tool Extra-P.

What is going to come?

In the next weeks I’ll write some notes about how to use PIRA for your own purposes and what I did when setting up my Gitlab CI instances.

Next Release: August 1st

The next PIRA release is planned for August 1st. It includes new features, such as automatic MPI-function filtering, configurable rebuild intervals, and better-to-use configuration files.

Name mangling in C++ with Clang and GCC

I recently came across the question whether it is possible to use lists of mangled function names (generated with a Clang-based tool) in a GCC compiler plugin. I am aware that name mangling is compiler dependent and not standardized, yet I had hopes that this would be something I can achieve.

I started with a quick web search. That, however, did not lead to satisfying answers, as I was still unsure whether it could actually work. Most of the answers I found were about the status when GCC 5 came out. Now, I am working with GCC 8 and things may change.

So, I continued by implementing very basic test cases to start this off experimentally, i.e., to get an idea on whether more reading about all this is worth my time. Codes were as simple as the one shown below. The first name in the comment is the GCC mangled name (g++ 8.3) and the second name is the Clang mangled name (clang++ 9.0).

void foo() {} // _Z3foov == _Z3foov
void foo(int a){} // _Z3fooi == _Z3fooi
double foo(double d){return 0;} // _Z3food == _Z3food
void foo(int a, double d) {} // _Z3fooid == _Z3fooid
namespace test {
 void foo(int a) {} // _ZN4test3fooEi == _ZN4test3fooEi
 double foo(double d) {return 0;} //_ZN4test3fooEd == _ZN4test3fooEd
}

So, at least given this small set of samples, there do not seem to be differences. I did similar tiny-scale experiments for classes and templates. All of them were simple enough to not discover differences. Eventually, I applied both compilers to a basic C++ implementation of the game of life (sources) and filtered the object code to get a list of all the function names in the resulting binary. I compiled at optimization level 0 to let the compiler not do inlining or other optimizations. I’m sure listing all functions in an object can be done much easier (e.g., using nm), but this is what I did (accordingly for a version of the code compiled with g++):

objdump -d clang++90.GoL | grep ">:" | awk '{ print $2 }' | sed -e "s/://" | uniq | sort > clang++90_names

Inspecting both lists of generated function names, I found differences. In particular, in the mangled name of the constructor of the GameOfLife class.

class GameOfLife {
  public:
    GameOfLife(int numX, int numY) : dimX(numX), 
                                     dimY(numY), 
                                     gridA(dimXdimY,
                                     gridB(dimXdimY){}
    // other members are omitted
};

The constructor is mangled into _ZN10GameOfLifeC1Eii by GCC and into _ZN10GameOfLifeC2Eii by Clang. The difference is the C1 vs. the C2 in the name.

Now, I wondered: what is encoded by these C1 / C2 parts of the mangled name? I know that Clang mangles the names according to the Itanium IA64 ABI specification. A quick web search lead me here and so I searched for the respective section of the specification. I found that the specification lists the following in 5.1.4.3 Constructors and Destructors.

  <ctor-dtor-name> ::= C1	# complete object constructor
		   ::= C2	# base object constructor
		   ::= C3	# complete object allocating constructor
		   ::= D0	# deleting destructor
		   ::= D1	# complete object destructor
		   ::= D2	# base object destructor

So, GCC treats the constructor of the GameOfLife class as a complete object constructor, whereas Clang treats it as a base object constructor.

At that point I did not continue digging deeper on why that is the case, i.e., thoroughly reading the IA64 ABI specification definitions, as for me it is sufficient to know that the differences in name mangling occur at such fundamental features as constructors. However, maybe, if someone (or a future me) has the same question (again), I thought I share this in order to know where to start looking for more detail.

Finally, the overall result of this small research is that I will need to write an LLVM plugin to mimic the functionality of the respective GCC plugin I wanted to use in my toolchain. Nothing too bad, but I would have been happier if I could just use the already available GCC plugin.

Overleaf for collaborative writing

We started to use Overleaf for collaborative writing of our research papers. After a few papers and other documents, I decided I share my experiences with it.

First, what did we do before we started using Overleaf?

Well, we used a git repository for the paper to synchronize the changes between different authors. Everybody used their favorite text editor and we agreed on some code style. My experience was that the most important thing is: write one sentence per line, because it makes merging just so much easier. Then we would send the pdf of the draft to whoever is doing the internal review. We get back a paper copy with handwritten remarks to be included and iterate the whole process. This isn’t bad, and I totally understand if people prefer to read on a printed out copy.

Now, how did Overleaf change this?

Overleaf is a little bit like multiplayer LaTeX. What we found to be more important: it sets up the pdf version of the paper immediately. This is particularly helpful for the internal review – at least in our group. With its comment mechanism, people can simply annotate the respective parts of the document. If they only found a typo, they can also immediately fix it in the document. This makes the review easier and faster accessible. I would, however, agree that the introduction of Overleaf is not the most important thing that ever happened.

That’s all great! What can’t it do?

I found it to be somewhat annoying that the editor is an always-on solution! You cannot, at least I did not see how, make the document / editor available offline to, say, work on a document while flying across the Atlantic [Yes, I am assuming you do not want to use the WiFi in the plane]. If you dig a little bit into it, you find that the Overleaf document is actually a git repository behind the scenes. Let’s just clone it, so we can work offline and then push the changes. Unfortunately, you can’t do this. The first part worked smoothly: cloning the project’s git. The second part, pushing changes to it, failed. At least I did not manage to add my credentials in a way that allowed me to push changes to the remote.

So what’s the conclusion?

My conclusion is: Overleaf provides a convenient way to collaborate with authors from other groups / institutions easily. It allows for nice and easy WYSIWYG reviewing and you can export the final document as a git to store it on your local git server, if you want to do so, e.g., for archiving purposes. Should you mostly find time to work on documents while you don’t have Internet access, Overleaf may not be the best solution.

All this silence

I just realized how silent I was on my website over the last almost 1.5 years. I decided that I should change that.

As a first follow-up on my article about the Opera tab manager: in some of the Opera versions after my write up, the team actually included a way to search tabs in a much nicer way than the plugin allows.

Given my, maybe weird, control setup for tab handling – Left_Alt + one_of(h,j,k,l) – I was happy that they set the shortcut to Left_Alt + Space by default. This opens the Opera quick search.

The quick search is an overlay that allows you to (1) search Google, and, more importantly for me, (2) search your tabs! This is great news for me, and, I would assume, for everybody who constantly carries around a larger bag full of open tabs. I really enjoy this feature as it nicely blends into my general browser setup. It feels also much more responsive and integrated into the browser compared to the plugin.

On another note: I have been following the Vivaldi development quite closely and think that it is a browser that you should follow and test every now and then. It does lack a little bit of performance from the UI compared to Opera or Chrome, but it has some nice features, like tab hibernation or tab stacks.

I also realize that I should write more articles here about the stuff that I do. And I will!

If you are interested you can also follow me on Twitter: @jplehr