apocryph.org Notes to my future self

1Jun/080

Visual C++ Apps Crashing in _chkstk Under Load

At work one of the devs was running into a weird problem. He could run a group of our unit tests on his dev box without any problem, but when the same tests ran on our build machine the test host process crashed with an unhandled exception. Thankfully we run all of our processes with an unhandled exception trap which generates a minidump before terminating, so we were able to determine the failure was in a C runtime function _chkstk called upon entry into a particular function that allocates alot of stuff on the stack.

At first, I was thinking stack overflow, but there wasn’t anywhere near 1MB of shit on the stack, which is the default maximum stack size. I thought about stack corruption, but the C-runtime stack checking routines provide an explicit message when stack corruption is detected. We commented out the large stack variables in the function that was being called, and that made the crashes go away, so clearly it was something stack related, but what?

I started Goggling about, and first ran into the Microsoft documentation for the _chkstk routine.aspx). There’s not much there:

Called by the compiler when you have more than one page of local variables in your function.

Remarks

_chkstk Routine is a helper routine for the C compiler. For x86 compilers, _chkstk Routine is called when the local variables exceed 4096 bytes; for x64 compilers it is 4K and 8K respectively.

Hmm, well, that explains why commenting out the large local variables made the problem go away; _chkstk isn’t even called without at least 4K on the stack. Still, that doesn’t explain why the crash is happening.

Then I ran across an article about debugging a stack overflow with WinDbg, the shitty not-at-all-intuitive debugger that ships with the Debugging Tools for Windows. That wasn’t interesting; what was interesting was the hypothetically stack overflow scenario they presented. It involved a mysterious crash in _chkstk!.

In the article’s example, the problem wasn’t too much stuff on the stack, it was a system committed page count very close to the max (that is, the amount of physical memory in the machine). You see, _chkstk grows the stack when needed by committing some of the pages previously reserved for the stack. If there is no more physical memory available for committed pages, _chkstk fails. Interestingly enough, the commited page count on our build machine was very near the max, while the committed page count on the developer’s box was low, which explains why he couldn’t repro it on his box.

The article offers little more than a shrug and a “shit happens” as a workaround. It does suggest increasing the stack commit size (the portion of the reserved memory committed when a thread starts) as a workaround, which will cause the necessary memory to be committed at the time a thread starts, such that under low memory conditions the thread won’t start at all, rather than crashing in some unpredictable spot when the stack grows too much.

As a result, we explicitly set the stack reserve and commit sizes to 1MB in the Linker | System property page for all of our executable projects. That will increase the quantity of committed memory at application and thread startup, but only by 1MB. In return, it will make low-memory conditions cause more obvious thread start errors, which to my mind is worth the up-front memory hit.

Just when you think you have a pretty good handle on Windows development, something like this comes along to remind you how much you don’t know.

13May/086

Worst C/C++ Gotcha Yet

Today I ran smack into what is easily the nastiest C/C++ gotcha of my entire software engineering career. From the early 90s reading Sam’s Teach Yourself C in 21 Days with a shitty freeware C compiler from a local BBS, through to today, I have been bit by just about every imaginable C and C++ gotcha, but this one takes the cake.

If you see what’s wrong with this code, you’ve been bitten by this before:

UINT32 x = 0x80000000;
UINT32 y = 2;
UINT64 z = x * y;
cwout << L"x*y=" << std::hex << z << std::endl;

If you think this code will output 0x100000000, you’ve not been bitten by this before.

You see, 2 times 0x80000000 is 0x100000000, which is 2^32 in hex. Unfortunately, since both x and y are 32-bit unsigned integers, the result is a 32-bit unsigned integer as well, implicitly cast to a 64-bit unsigned integer only after the computation is performed. And since 0x100000000 is too big to represent as a 32-bit quantity, the low 32 bits of the value (or, 0x00000000) are all that’s preserved. That’s right; this code prints 0 for the value of z.

What’s worse is I can compile this code using Microsoft Visual C++ 2008 with warnings cranked up to max, and the compiler has absolutely nothing to say. No friendly warnings about possible truncation, nothing. And this from a compiler that won’t let a constant in a conditional slip by with without making a snide remark.

In case you’re wondering, the correct way to implement this multiplication is to cast one or both of the 32-bit integers to 64-bit, which will trigger implicit type promotion and treat the whole expression as the product of two 64-bit integers.

It’s also worth noting that the equivalent C# code:

uint x = 0x8000000;
uint y = 2;
ulong z = x * y;
Console.Out.WriteLine("x*y={0:x}", z);

produces the correct result without explicit casting. Reason number 0xbadc0de why C# is better than C/C++. exact same result. DOH!

UPDATE: Daren pointed out I had one less zero in the C# example.  Contrary to my previous results, the C# version of the test app behaves the same way.  So I owe C++ an apology; in this regard it only sucks as much as C#, and not more.

6Sep/060

Things I do after programming in a language for too long

I’ve been going back and forth between C++ and C# alot in my current job. When switching between languages, certain elements bleed together leading to the same mistakes over and over. Some of mine:

C# to C++

  • Putting array braces ([]) on the type instead of the variable name (eg, int[] whatever in C# vs int whatever[] in C++)
  • Writing string literals with naked double quote (ala ") instead of the L prefix to denote Unicode
  • Naming methods with PascalCase instead of the camelCase in our C++ convention
  • Throwing exceptions with new, which particularly sucks since it won’t yield a compile error
  • Testing objects for null

C++ to C

  • Prefixing string literals with L to denote Unicode
  • Throwing exceptions without new
  • deleteing objects when I’m done with them
  • Setting variables to NULL instead of null
  • Testing a variable for non-zero-ness implicitly, like if (whatever) instead of if (whatever != null)
  • newing types with default ctors without parens, eg new SOME_STRUCT instead of new SOME_STRUCT()

Other

I have similar problems with other languages, but the only one that really jumps out at me now is using/not using semicolons when transitioning to/from VB/VBScript. Thankfully I haven’t had to do that in such a long while I cannot recall what other differences tripped me up.

6Sep/060

The shitty C++ memory leak debugging experience

At my new job I found myself chasing some memory leaks in our rather large C/C++ codebase. Going into the task I was optimistic and just a bit overconfident, knowing as I did that the C Runtime (CRT) has built-in leak-finding goodness.

After the honeymoon, it became clear I was mistaken. Sure, you can set the _CRTDBG_LEAK_CHECK_DF flag with _CrtSetDbgFlag. You can even enable source information with _CRTDBG_MAP_ALLOC. Go ahead. Try it. Use this code:

    #include "stdafx.h"

    extern "C" int main(INT argc, TCHAR *argv[])
    {
            char* c = new char[2048];
            ::strcpy(c, "You're screwed, pal!");

            return 0;
    }

What? The line number reported for the leak is some bullshit location deep within the CRT? How can this be? The CRT is better than that!

Not. The CRT provides debug versions of those canonical C-isms malloc and free, but not the more enlightened C++ staples new and delete. For that, you have to roll your own. Here’s how I roll:

#define DEBUG_NEW new(_NORMAL_BLOCK, __FILE__, __LINE__)
#define new DEBUG_NEW

‘Ah’, you’re thinking, ‘that wasn’t so bad’. It’s not over yet. Now add a #include <set> after your DEBUG_NEW cruft; from the uncontrollable sobbing I take it you’ve seen the torrent of inscrutable compile errors you got. That’s because the debug new breaks STL which does its own slightly-off-reservation heap stuff.

The solution? Don’t do that. Include STL headers before your DEBUG_NEW stuff. Can’t do that cleanly? Tough shit. You must not want source information that badly.

18Aug/060

Why doesn't va_start work with reference parameters?

I’ve been hacking alot of C++ lately as a part of my new job. I implemented a simple string formatting function using __vswprintf_s and the C variable argument constructs va_start, va_list, etc.

I was surprised to discover that, when the prev parameter argument passed to va_start is a C++ reference parameter (like, say, const std::wstring&, as in my case), the resulting variable arguments are garbage, and of course any printf-like function produces nonsense and/or crashes.

It’s not at all clear to me why this is, though it sucks rather profoundly.

Delicious Bookmarks

Recent Posts

Meta

Current Location