vrijdag 3 februari 2017

How to execute data

Somebody recently inquired about recursion, and I wanted to provide him an example showing that you could do infinite recursion if you use tail recursion. Since there's no way to make a C++ compiler *have* to do tail recursion, I made the example inline bytecode to ensure it was a tail recursion.

#include <sys/mman.h>
#include <stdint.h>

int main() {
    char function[2] = { 0xEB, 0xFE };
    mprotect(((intptr_t)function | 0xFFF) - 0xFFF, 0x1000, PROT_EXEC | PROT_READ | PROT_WRITE);
    ((void(*)())function)();
}

Now the thing you will notice is mprotect. This is a function to tell your operating system that a given range of memory should have its properties modified. In this case, the function array is on the stack and sane operating systems map that as non-executable, to prevent you from executing your stack contents. The mprotect call circumvents this by explicitly asking it to be executable, and my OS is nice enough to honor such a request. You can try this and you will see it crashes if you omit it.

So then the big question comes with this bit of code

const char function[] = { 0xB8, 0x2A, 0x00, 0x00, 0x00, 0xC3 }; 
int main() { 
  return ((int(*)())function)(); 
}

As you can see, no mprotect call. The array is not modified in any evil way (such as "__attribute__((section(".text")))" to force it to be executable, and it is not. Yet, this runs just fine!

So what's going on? Let's investigate the binary to see what's up.

$ objdump -d test | grep 2[aA]
$ 


There's definitely no 2A byte in the code. This would show it if it were code - so it is definitely in the data section. In fact, it'll be in the .rodata section:

$ objdump -h test

test:     file format elf64-x86-64

Sections:
Idx Name           Size         VMA                LMA       File off Algn
 12 .text       000001c2 0000000000000530  0000000000000530  00000530 2**4
                CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .rodata     0000000a 0000000000000700  0000000000000700  00000700 2**2
                CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .eh_frame   000000f4 0000000000000740  0000000000000740  00000740 2**3
                CONTENTS, ALLOC, LOAD, READONLY, DATA
 17 .init_array 00000008 0000000000200de0  0000000000200de0  00000de0 2**3
                CONTENTS, ALLOC, LOAD, DATA
 22 .data       00000010 0000000000201000  0000000000201000  00001000 2**3
                CONTENTS, ALLOC, LOAD, DATA
 

I've omitted the irrelevant sections for readability. To note is that all READONLY sections are grouped together and mapped to the first page of memory, and the writable sections are grouped up and mapped to the second (2MB) page of memory. Look at the VMA column for the target Virtual Memory Address.

The .text section requires it to be executable - so the page that contains it will be executable. That page runs until beyond the end of .rodata though - so by extension, all of .rodata will be executable! The runtime doesn't even notice that we have data there, as the page granularity of the protection does not allow it to tell the OS. You can remove the "const" from the array; that will move it into .data and by extension a different page, causing it to be guarded by the non-executability of the .data section.

Note that this assumes you have a processor with NX (No-Execute) or XD (eXecute-Disable) bit. If you do not, the code will work regardless as your processor has no way to stop it.

zondag 11 december 2016

Building your software and unit testing...

When unit testing, you start by adding a new unit test executable target to your project. Then you run this whenever you have time for it. When it's green your code is fine and you commit or refactor, when it's red you fix the problems and get it green again. It's a very simple idea, yet it's wrong.

It's the same problem that you get when your project has warnings. Warnings don't work. They fail, because when recompiling with an incremental compile - which every nontrivial project will do - you do not get warnings from the files that you do not compile. It's a way of the build output not being stable - it only gives it when it happens to recompile one of those files, or when you do a full build. You tend not to see warnings often because of this, even if you have them, so people either ignore warnings altogether or turn them into errors.

The same thing can be done with your unit tests though! There's no reason that your build output should not depend on your unit test. But you actually don't want it to depend on your unit *test* itself, but on the *result*. That can be arranged though - running a unit test (with output redirected into a file) and coalescing the outputs are just two more steps in the build sequence. You can then make your actual output folder depend on the unit test results existing - ie, the build step creating them succeeding - and you will always run your unit tests as part of the build.

This is awesome in ways that you won't expect. It means that a red test will imply a red build. It implicitly interacts with build dashboards - a red test, makes a red build, makes a red icon on the build dashboard. You cannot commit with the appropriate pre-commit hooks. People who only try to "compile" the software will also be testing it, so "it compiles for me" actually *is* good enough to commit with.


And it allows you to only re-run the tests that have changed. After your incremental build, you now have incremental unit testing. And incremental system testing. The better split-up your tests have (ie, the less dependencies they have), the less tests will run for a given change. In many cases you will be testing nearly nothing, as you already showed from the compile step that the test executable hasn't changed - and if that doesn't change, why would the result? No reason to run it again.

So next time you set up unit tests, make your outputs depend on them being successful.

vrijdag 11 november 2016

Templates and binary bloat

At some point in a big enough project, you'll find your output objects have grown immensely in size. Part of this will instill a bit of pride - look at the big thing we've made! - but part of it will instill a bit of fear. Why is it this big? Are our customers going to accept a 90MB executable or app? Why did it end up being this big in the first place? For one specific cause of unnecessary bloat, we need to understand how a linker works.

A linker works by assuming you will want one particular symbol to exist - "main". It then takes each object file or static library in succession, taking from it all sections that you have a reference to (plus a few hardcoded ones if you refer to anything in the file at all). It then adds all the symbols from those sections to your symbol table as available, and takes all new relocations from it as things to be satisfied. It keeps doing this until it is at the end of your input list. At that point it sees if there are still missing symbols - an undefined reference or unsatisfied relocation, so to say.  If there's anything it will complain and refuse to produce an output file, otherwise it merges everything, fills in the addresses of everything and calls it a day.

The main thing to take away from this is that a linker is *stupid*. You can describe its full function (assuming static libraries and no LTO or other fancy optional features) in a single paragraph, without major omissions. It takes full sections and it does not really know about functions, or what your intents were at the time of writing. This has an interaction with templates, inline functions and auto-generated functions - they have to be there at link time, otherwise you'll get a link error if somebody refers to them by symbol, but at the same time they cannot be there because by their nature they'll be in multiple files. Compilers can solve this in two ways - either by identical code folding, or by weak symbols. The net result of both is equivalent for this discussion - you pick one of the copies, and ignore the other.

Therein lies the problem. You can ignore the other, but you may not be able to make it go away. Remember that a linker looks only at *sections*? You can tell your compiler to merge all symbols of the same type into a single section (such as .text, .rodata or .data) or to split each into a separate section. Some sections are used for other purposes too - such as template association information (.group sections in ELF) or the symbol table for your object file. Add to that a strict upper limit of 65535 sections - because in both ELF (Linux) and PE/COFF (Windows) the limit is a 16-bit counter - and you can see where I'm going.

Given enough symbols to create and the intent to keep everything separate, at some point the compiler will no longer be able to do this because it's run out of sections. Pragmatically this appears to be around 32k; most likely because the compiler has to be certain it can still fit in your symbol table, dynamic relocations and so on. When this point is hit, the compiler will start to no longer separate symbols per symbol, but instead merges all your remaining code, template instantiations, implicit functions and so on into one big ".text", one big ".data", one big ".rodata" and one big ".bss" section. So if your code refers to any symbol from those sections, you will be linking in the full section. Chances are something in .text refers to something in .data, so you'll also get the full .data section, .rodata section and .bss section. Weak symbols or not - the space is allocated anyway, your executable will have space for them. It will occupy space on your target, it will worsen your instruction cache performance, it will worsen your page load times...

But of course, you would never make an object file with 32k symbols!  That's a crazy amount of code to put into a single file. If you use templates though, it's easy to accidentally get a lot of them. The reason behind this is that your template code may, in some code path, create your object. If your object is created, it'll need access to all its parents' constructors, all subobject constructors, all functions called from there, and the full vtable for each of those classes. Those vtables need to be filled with all your virtual functions (if it can see them they will also be output), including the destructor. If you don't explicitly say the destructor will be in some implementation file, it will *also* be generated into this object file and so will its direct and indirect references when they are visible. So a simple code file like

    class MyClass {
    public:
        virtual ~MyClass() = default;
    private:
        std::map<std::string, std::set<std::string> > content;
    };

already explodes to expose 230 symbols, if there's any way to access the constructor. They're all weak symbols, each of them can be COMDAT or ICF folded away, so there's no harm done. Just a linker that has to work a bit harder to ignore all the copies. But that only applies if they are in a separate section!


So in a large project, consisting of 7000 object files, we found that 100 or so of them had more than the mentioned 32000 sections. It's difficult to get a good read on it because, while all the tools tell you the number, they exclude some sections from it. The number they report will be around 20000 when your actual section count has already hit 32000 in many cases. For those 100 files, the linker will have no option for a large number of symbols, code, data and so on to just link in everything wholesale. A hundred times over. And remember, these were your biggest files to start with.

vrijdag 17 juli 2009

Zone pricing

There's a thing called zone pricing. Basically, it means that where you live decides what you pay. There are loads of examples. For instance, comparing myself (Netherlands, southern city) to somebody from the US (Miami, Florida).

They're both cities. Neither is the people hub of the country, both are not small. No prices were actually checked, but the basic idea is the main thing I'm trying to convey.

- Windows 7 costs him 199 dollars. Me it would cost 299 euros, which is about 350 dollars.
- A litre of fuel costs me 1.40 euros. A gallon of fuel costs him 2 dollars 50, or about 55 eurocents per litre.
- A big mac costs me 6 euros after tax. Him it would cost about 4 dollars (for a comparable burger).
- A DVD of a new movie costs him 20 dollars. For me it's impossible to acquire, and if I wait 5 months I could get that very same movie for 20 euros.

The difference between these three items is why I am really annoyed with the price of Windows (even though I'm certainly not going to buy it), I'm moderately miffed with the fuel and I'm quite OK with the burger.

Each of these items has a reason why it uses zone pricing.

The big mac is less expensive to make in the US due to local meat being cheaper. Also, if I wanted, I could just go to Miami and buy the US version of the burger at the US price. It cannot compete with the one at home since it spoils quicker than I can get it here.

The fuel is exactly the same price to produce, but I couldn't drive to the US to get cheaper fuel as it would completely obviate the point - I'd use more fuel to get it here than it would spare me.

Windows is the one that annoys me. It's the exact same product, no bit is different (other than IE being optional for its UI - which doesn't change anything, really). I could get one flown in for about 80 eurocents. Why do I have to pay about double to get it, then?

The movie is the worst of all. I can get it from there, cheaper than I could get it here if I wait for a few months, and there's no rational reason why I shouldn't be able to. I can buy the burger, I can buy the fuel, I can sort of buy Windows, but I can't buy (and use) the DVD? Why not, exactly? It won't spoil, the transportation cost isn't too high, I'm quite OK with everything being in English (which Hollywood productions tend to be in any case) so why can't I get it on the day that somebody somewhere can get it?

vrijdag 10 juli 2009

Classicist or Mockist TDD?

You can divide TDDers in two groups - Classicist and mockist TDDers.

The first will write unit tests that are somewhat dependant and that lead to tests that rely on other tests working. The order in this is not stored (usually) making the evaluation of a test run somewhat more complicated. The tests are pretty straightforward though, no awkward mocks have to be constructed and little duplicate code is required.

The second will write unit tests that are fully independant and that lead to tests that are very quick and reliable. The order of running tests is irrelevant and the result of a test run is exactly the code that is failing. The downside is that end-to-end testing falls away and that it results in much more code for the same result.

Most people either target one or the other. You either support mocks with lots of mock support that basically pretends to be functors in a similar-but-not-quite language that is usually badly designed and very limited test support that basically just says that there's a bunch of tests and that runs them in some random order. Classicist testing usually has a fully-featured testing framework with some order-specifying things and loads of long-time taking tests that scale ever less.

For either there are *loads* of options for C++. At least 8 mocking frameworks and 35 testing frameworks, no less.

The problem with this separation is that it's pretty stupid. Either option is bad when used as the sole answer to the question of testing. You're much better off using both, when they're at their best. Use mocks for the interconnecting classes that are fairly easy to test if you take out the bottom half, use regular testing for the classes that don't lend themselves to this. For some classical tests, replace the more unusable bottom-end classes with mocks and let the rest be.

None of those 35 testing frameworks targets getting a quickly failing test, or in fact the group of people that aren't on either extreme.

You can divide TDDers in two groups - but please don't.