vrijdag 11 november 2016

Templates and binary bloat

At some point in a big enough project, you'll find your output objects have grown immensely in size. Part of this will instill a bit of pride - look at the big thing we've made! - but part of it will instill a bit of fear. Why is it this big? Are our customers going to accept a 90MB executable or app? Why did it end up being this big in the first place? For one specific cause of unnecessary bloat, we need to understand how a linker works.

A linker works by assuming you will want one particular symbol to exist - "main". It then takes each object file or static library in succession, taking from it all sections that you have a reference to (plus a few hardcoded ones if you refer to anything in the file at all). It then adds all the symbols from those sections to your symbol table as available, and takes all new relocations from it as things to be satisfied. It keeps doing this until it is at the end of your input list. At that point it sees if there are still missing symbols - an undefined reference or unsatisfied relocation, so to say.  If there's anything it will complain and refuse to produce an output file, otherwise it merges everything, fills in the addresses of everything and calls it a day.

The main thing to take away from this is that a linker is *stupid*. You can describe its full function (assuming static libraries and no LTO or other fancy optional features) in a single paragraph, without major omissions. It takes full sections and it does not really know about functions, or what your intents were at the time of writing. This has an interaction with templates, inline functions and auto-generated functions - they have to be there at link time, otherwise you'll get a link error if somebody refers to them by symbol, but at the same time they cannot be there because by their nature they'll be in multiple files. Compilers can solve this in two ways - either by identical code folding, or by weak symbols. The net result of both is equivalent for this discussion - you pick one of the copies, and ignore the other.

Therein lies the problem. You can ignore the other, but you may not be able to make it go away. Remember that a linker looks only at *sections*? You can tell your compiler to merge all symbols of the same type into a single section (such as .text, .rodata or .data) or to split each into a separate section. Some sections are used for other purposes too - such as template association information (.group sections in ELF) or the symbol table for your object file. Add to that a strict upper limit of 65535 sections - because in both ELF (Linux) and PE/COFF (Windows) the limit is a 16-bit counter - and you can see where I'm going.

Given enough symbols to create and the intent to keep everything separate, at some point the compiler will no longer be able to do this because it's run out of sections. Pragmatically this appears to be around 32k; most likely because the compiler has to be certain it can still fit in your symbol table, dynamic relocations and so on. When this point is hit, the compiler will start to no longer separate symbols per symbol, but instead merges all your remaining code, template instantiations, implicit functions and so on into one big ".text", one big ".data", one big ".rodata" and one big ".bss" section. So if your code refers to any symbol from those sections, you will be linking in the full section. Chances are something in .text refers to something in .data, so you'll also get the full .data section, .rodata section and .bss section. Weak symbols or not - the space is allocated anyway, your executable will have space for them. It will occupy space on your target, it will worsen your instruction cache performance, it will worsen your page load times...

But of course, you would never make an object file with 32k symbols!  That's a crazy amount of code to put into a single file. If you use templates though, it's easy to accidentally get a lot of them. The reason behind this is that your template code may, in some code path, create your object. If your object is created, it'll need access to all its parents' constructors, all subobject constructors, all functions called from there, and the full vtable for each of those classes. Those vtables need to be filled with all your virtual functions (if it can see them they will also be output), including the destructor. If you don't explicitly say the destructor will be in some implementation file, it will *also* be generated into this object file and so will its direct and indirect references when they are visible. So a simple code file like

    class MyClass {
    public:
        virtual ~MyClass() = default;
    private:
        std::map<std::string, std::set<std::string> > content;
    };

already explodes to expose 230 symbols, if there's any way to access the constructor. They're all weak symbols, each of them can be COMDAT or ICF folded away, so there's no harm done. Just a linker that has to work a bit harder to ignore all the copies. But that only applies if they are in a separate section!


So in a large project, consisting of 7000 object files, we found that 100 or so of them had more than the mentioned 32000 sections. It's difficult to get a good read on it because, while all the tools tell you the number, they exclude some sections from it. The number they report will be around 20000 when your actual section count has already hit 32000 in many cases. For those 100 files, the linker will have no option for a large number of symbols, code, data and so on to just link in everything wholesale. A hundred times over. And remember, these were your biggest files to start with.