Annoying gotcha w/ _m128
I ran into a nasty gotcha w/ the Visual C++ intrinsic type __m128 today. __m128 is a 128-bit integer which is used with the Intel SSIMD extensions, which allow fast integer math on 128-bit registers. SSIMD was created for games and video, but the code I was using (not written by me) was just taking advantage of the additional bits to compute a faster ECC.
The first problem is that 128-bit integers must be aligned on a 16 byte memory boundary. This is intuitively obvious, as the x86 architecture likes (but in some cases doesn’t require) all the multi-byte integer types to be aligned on a memory boundary equal to their size. You wouldn’t have a 32-bit integer at memory address 0x3, so why would you expect a 128-bit integer at memory address 0x9 to work?
You wouldn’t, and neither would I, but it’s not that simple. The code I was dealing with had __m128 members nested deep within multiple levels of structs. My problem was I was creating a BYTE array, reading in some data from a file, casting it to this struct, then operating on the struct. Ouch.
What threw me off is the exception I got when accessing the __m128 variables. I would’ve expected an exception code indicating an alignment problem, which would’ve immediately clued me in. What did I get instead? Exception 0xc (access violation) at memory address 0xffffffff. WTF!? I was hunting for a pointer to 0xffffffff in vain for several minutes before I dropped to the disassembly view and saw a SSIMD equivalent of a mov, which led me to look up the __m128 type in the first place.
In the end I was able to add padding to my BYTE array to ensure I started the read at the proper memory boundary, but that was crufty so I just changed my I/O code to use the struct itself, and the compiler took care of the alignment.
Dammit, why is programming so hard all the time?