I've said it before, and I'll say it again: Side-by-side assemblies are WAY worse than DLL Hell
I’ve bitched at length about Microsoft’s side-by-side scheme for ending the “DLL hell” that used to rule Windows, whereby multiple applications would each write their preferred DLL to Windows\System32, and break other apps that expected a different version. Microsoft set out to fix this problem with what I can only assume was a room full of PhD’s and no input from working programmers or sysadmins, since the resulting abomination takes a hard, non-obvious problem and replaces it with a very hard, virtually inscrutable problem.
Twice so far in the last 24 hours has this has bit me in the ass. First, I was trying to run some code on a test VM which I built on my development machine. Since I wanted to, you know, debug this code, I created a ‘debug’ build, which links the executable with the debug version of the Microsoft C runtimes. Every developer who’s ever tried to do this knows it won’t Just Work, since the debug runtimes won’t be present on the box. What I used to do was manually copy all the DebugCRT and DebugMFC files from my dev machine’s c:\windows\winsxs directory into the destination machines, which is exactly what Microsoft admonishes us not to do, but they don’t leave me much choice.
However, this VM I’m using this time is Windows Server 2008, whereon even the mighty Administrator doesn’t have write privs to winsxs; that’s limited to the Trusted Installer user. Sure, I could munge the privs, but this is supposed to be a test box, so I don’t want to munge it too far afield of what our customers’ configs will look like. So, I do what the WinSxs (say “win sucks”, because it does) guys suggest is to find the Debug_NoRedist folder in the Visual Studio 2008 install directory, copy the folders there under your architecture directly into the directory where your application is located, and voila!, through the magic of WinSxs, it will Just Work.
And, it would have. Except the Visual Studio team apparently doesn’t really get WinSxs either, because after I installed Visual Studio 2008 SP1, it updated the Debug_NoRedist folder with the new SP1 runtimes, which includes manifests that reflect the SP1 version of the debug CRT (9.0.30729.1), and not the old RTM version (9.0.21022.8). That’s the right call, except the binaries produced by SP1 have in their embedded assembly manifest a reference to the CRT version 9.0.21022.8. This works fine on a dev workstation where both the RTM and SP1 versions of the runtimes are installed, since there’s a policy file in the WinSxs directory that redirects requests for 9.0.21022.8 to the 9.0.30729.1 version. However, when you’re doing an isolated application that loads the CRTs from the current directory, there is no such redirect, so the app looks for 9.0.21022.8 and can only find 9.0.30729.1, so it eats shit and dies.
The fix? You’ll love it. Edit the Microsoft.VC90.DebugCRT.manifest file to change the version number from 9.0.30729.1 to 9.0.21022.8, and QED. Yes, that makes me feel dirty, but at least it works, which is more than I can say for the so-called “correct” way.
To whichever team came up with WinSxs: FUCK YOU ASSHOLES!
The second and more subtle way in which this bit me in the ass had to do with using CreateProcess to spawn an executable with a path like this: c:\foo\bar..\boo\baz.exe; that is, a .. somewhere in the path. That works fine normally, but if we’re using the debug runtimes, and there’s a Microsoft.VC90.DebugCRT directory in c:\foo\bar and also one in c:\foo\boo, then I get crashes the first time I allocate memory from a DLL and free it from the process. I can verify only one instance of the CRT DLLs are loaded into the process, and I can verify the DLL is compiled correctly, because when I switch from c:\foo\bar..\boo\baz.exe to c:\foo\boo\baz.exe with no other changes, it works fine.
Is this WinSxs’ fault or the CRT’s fault? Does it really matter? No: FUCK YOU ASSHOLES!
The Windows Security Model Is Ridiculous
At work I’ve been writing some code to do some basic interactions with the Windows security subsystem, such as testing if a user has the Log On As a Service privilege. Every time I venture in this area of Windows I cringe, because the security APIs are disjointed, incoherent, and vastly over-complicated. Presumably this is due to the evolution of the Windows security subsystem as it diverged from its VMS roots, but that’s no excuse.
The latest example pertains to the ‘Log on as a service’ privilege, called SeServiceLogonRight, which corresponds to a #define in the Windows SDK called SE_SERVICE_LOGON_NAME. I need to know if a given account has this privilege, because if not I need to grant it before the account can run a service. Easy, right?
You’d think so. There’s an API function, PrivilegeCheck, that will test if a given token has a set of privileges. The only problem is that PrivilegeCheck wants the privileges described as LUIDs, which are the binary identity of each privilege. All I know is the so-called “programmatic name”, that is, SeServiceLogonRight. No problem; there’s an API for that, called LookupPrivilegeValue, that returns the LUID for a privilege given its name.
So I wrote that up, but something’s wrong. LookupPrivilegeValue is failing with error ERROR_NO_SUCH_PRIVILEGE (that’s error number 1313). The text associated with that error code is A specified privilege does not exist. This is, of course, bullshit, since I got the priv name from the Windows headers.
After a bit of spelunking, though, I found this MSDN document.aspx) that notes certain privileges, including SeServiceLogonRight as well as other favorites like SeInteractiveLogonRight and SeNetworkLogonRight, are so-called “account rights”. What’s the difference between an “account right” and normal privileges? Well:
All of the LSA functions mentioned in the introduction above support both account rights and privileges. Unlike privileges, however, account rights are not supported by the LookupPrivilegeValue and LookupPrivilegeName functions.
Nice. So, how the hell do I test for this privilege–I mean–account right?
The LsaEnumerateAccountRights function enumerates the account rights held by a specified account.
Lovely. The reason that sucks is that I can pass a token from OpenProcessToken to PrivilegeCheck, but LsaEnumerateAccountRights wants a SID, plus I have to open a handle to the local security policy. Why the FUCK are account rights and privileges treated as equivalent if they have different semantics and APIs? And don’t get me started on the epic FAIL that is LSA_UNICODE_STRING!
More on the bullshit Windows video experience
I’ve written before about the unreasonable difficulty associated with watching DVD-quality video on a Windows machine. Yesterday I finally figured out how to get the x64 version of Vista Media Center to play back HD video without the video lagging significantly behind the audio.
As I’ve noted before, video on Windows is hard because:
- Microsoft does not ship the codecs needed to play the relevant high-resolution video formats
- There is no standard source for such codecs. Multiple codec packs exist, while others install the codecs they need manually
- 64-bit editions of Vista run the 64-bit Vista Media Center, which can only use 64-bit codecs, but for some reason the world of codec developers has not caught up with the last five years of processor architecture advancements, and ships only 32-bit binaries
None of these problems are insurmountable, but if you’re just a regular user and you’re trying to set up a media center PC with which to play your video collection, it’s unlikely you’ll have a good time of it. It’s also hard to find answers to these questions because it seems everyone posting on TheGreenButton and related forums find a slightly different way to solve the problem. It also doesn’t help that Microsoft are shipping ‘updates’ to VMC that break it substantively.
So, here’s my secret handshake for HD playback in x64 VMC:
- Install the Vista Codec Package
- Install the Vista Codec x64 Components
- From the Start menu, go to VistaCodecPack, 64-bit Tools, Video Decoder Configuration. Check the check box to enable OSD (on-screen display). You can muck w/ the font settings to give the OSD an alpha channel value of 0 (meaning it’s fully transparent and you can’t see it). This seems absurd, but there really is a bug in the FFDshow stuff such that HD content, at least in MKV files, will playback with the video lagging the audio by several seconds, making the result unwatchable. For some reason, turning on the OSD, even if it’s alpha-channeled into invisibility, makes the problem go away.
- Uninstall the God-forsaken June 2008 Cumulative Update (KB950126). This was a delightful little best-ever update from Microsoft that made Vista Media Center super-awesome, but broke the following things:
- If video playback is paused or stopped, it will resume when the screensaver starts, when the Media Center window is minimized or resized, or in response to other random events
- When you stop playback, you may or may not get the menu with the ‘Done’, ‘Resume’, ‘Delete’ options.
- When you pause video and then resume, you may or may not resume where you left off, or playback may skip an hour ahead
And there you go. With this done, assuming your box has the juice for HD video, VMC will play HD content without the damnable lag. If you’re like me, you’re wondering how the learning-disabled monkeys that run the Vista Media Center QA department could possibly have missed the above bugs, so glaring that the Media Center forum sites are abuzz with wrath for this fucking update. I have no answer for you, except to note that Microsoft continues to not be serious about the Media Center convergence concept, despite BillG’s protestations to the contrary.
Nasty Winsock Overlapped I/O Gotcha
Today at work I ran into a nasty gotcha with network socket I/O in Windows.
To achieve maximal network throughput in Replay, we use I/O Completion Ports (IOCPs), which are kernel objects which aggregate asynchronous queuing of multiple I/O operations. Due to the way we’ve written our network transfer code, when doing a transfer we don’t know in advance how many read operations will be needed to complete the transfer, so I keep some n number of socket reads outstanding at all times, then once we get the EOF, I use CancelIO on the socket object to cancel any reads left outstanding. That way, they still come back from GetQueuedCompletionStatus and thus the memory associated with them can be freed, plus we maximize throughput by keeping plenty of read buffers available to the TCP/IP stack at all times.
This works fine except in two isolated, non-reproducible cases in which the call to CancelIO resets the TCP socket, resulting in a RST packet being sent back to the source, and both sides failing subsequent send/recv calls with WSAECONNRESET. I can only conjecture that the CancelIO implementation has some sort of failure case wherein it must reset the connection and give up. Highly fucking lame.
As a result, I had to rearchitect our transfer code to make sure we do all the writes we need to do before CancelIO, since I now have to assume CancelIO is a death sentence for whatever socket I call it on. Lame.
Tip: Use cifs instead of smbfs to mount Windows shares on Linux
Now that I have Azureus mostly running on Ubuntu, I’ve been focusing on integrating the machine into my download/watch/archive pipeline.
Back when I ran BT on a Windows XP box, I kept a drive mapped to my media share on nemes, my 2TB NAS box. Whenever a download finished, I’d copy it over to that share. Now that BT is running on Linux, I want much the same functionality.
Initially I went with NFS, since NFS is the closest thing to a native network filesystem technology on Linux, and my NAS box, being a Linux derivative itself, also supports it. However, I had alot of trouble with the UIDs not matching between nemes and aenea, such that I could copy files to the media share, but then they’d be read-only when I accessed them from Windows. Lame.
Rather than fuck about, I just switched to using Samba on Ubuntu to get at the Windows share. Lame, but I don’t give a shit.
Anyway, since it’s been at least six months since I last had to set up Samba, I forgot how to do it and Goggled about. I found a post somewhere about setting up an fstab entry to mount the Windows share with the smbfs file system driver. Something like this:
# Mount nemes via Samba
//nemes/media /usr/local/nemes/media smbfs auto,credentials=/etc/samba/nemes_creds,rw,uid=anelson 0 0
//nemes/warez /usr/local/nemes/warez smbfs auto,credentials=/etc/samba/nemes_creds,rw,uid=anelson 0 0
I tried it, it worked, yay.
But then I started getting really annoying problems. If I kept the box up for very long I’d start to get crap like this in syslog:
syslog.0:Oct 24 22:04:11 aenea kernel: [86867.358441] smb_add_request: request [ffff81000dd4de00,mid=23932] timed out!
syslog.0:Oct 24 22:10:30 aenea kernel: [87245.396073] smb_add_request: request [ffff8100205e0e00,mid=23934] timed out!
syslog.0:Oct 24 22:10:30 aenea kernel: [87245.396095] smb_add_request: request [ffff8100025d2e00,mid=23935] timed out!
And timeouts trying to copy files into the mount point for the share. If I tried to umount the volume, I’d get device busy errors.
Turns out, smbfs is deprecated in favor of cifs. I then reworded fstab to:
# Mount nemes via Samba
//nemes/media /usr/local/nemes/media cifs auto,credentials=/etc/samba/nemes_creds,rw,uid=anelson 0 0
//nemes/warez /usr/local/nemes/warez cifs auto,credentials=/etc/samba/nemes_creds,rw,uid=anelson 0 0
That still wasn’t working, though. Something about an error -13 from cifs. Turns out whitespace is now significant in the credentials file, so I edited /etc/samba/nemes_creds and removed the whitespace to the left and right of the =, and that took care of it.
Now I have stable Windows share mounts, with decent transfer performance.
What a Clusterfcuk! Running MS Cluster Services
I’m working on adding support for Microsoft Exchange clusters to the next version of my company’s product. In order to do that, I need to have an Exchange cluster to develop/test on. Ironically, assembling Windows machines into a cluster seems to decrease their stability at an exponential rate.
Microsoft’s clustering solution is pretty lame. Basically you set up two or more cluster nodes with access to some sort of shared disk bus, like SCSI, iSCSI, or Fibre Channel. The nodes talk amongst themselves and decide who will own the shared disk resources and run the clustered apps. Effectively, it’s an active/passive configuration.
Anyway, for our tests, I threw together a VM running Ubuntu Server 7.04, and put the iSCSI Enterprise Target on it. I added three 5GB virtual disks and exposed them as targets with IET.
I then installed the Microsoft iSCSI Initiator 2.04 on my two cluster nodes, and verified it could attach to the iSCSI targets. I formatted the drives and felt smugly self-satisfied.
However, the problem starts after I install Exchange 2003. Suddenly, stopping the clussvc (The Cluster Service) hangs the machine. Consistently. Hard. Wedged.
I tried using the Rocket StarWind iSCSI target instead; same problem. WTF?
I’ve spent days now trying to get this cluster stood up. I’m beginning to think the whole cluster thing is some sort of right-wing Zionist corporate conspiracy to sell high-price ‘cluster certified’ hardware to clueless IT boffins.
Building Ruby on Windows, and performance
Last time, I encountered horrifying performance with my Ruby extension, and had two action items:
- Build Ruby from sources so I’d have debug information
- Profile my extension using Intel VTune
I was actually shocked how easy it was to build Ruby from sources. Under windows it’s literally just:
win32\configure
nmake
nmake test
nmake DESTDIR=foo install
Seriously. I did have to change win32\Makefile.sub to add /fixed:no to the linker command, since VTune won’t work with modules that are not relocatable, but other than that it was a no-brainer. All this makes me wonder why the official Windows builds of Ruby aren’t built with VC2k5 when it’s so superior. In fact, my test ran on the VC2k5 version of Ruby nearly twenty seconds faster, 75 seconds instead of 94!
Anyway, with that done I adjusted my VC2k5 extension project to copy the DLLs to the new ruby path, and got underway.
Let me now digress for a moment and point out just how horrifyingly bad Intel VTune is. I use VTune at work for performance-tuning our server apps, and whenever I encounter a performance problem I exhaust all other alternatives before I bring VTune to bear; it’s that bad.
First off, one gets the feeling that, despite being in version 9.x, VTune is written and maintained by interns. It’s GUI is clunky, its installer is temperamental, it crashes for no discernible reason, it won’t run at all without admin privs (seriously, not at all; won’t even start) and the support board is full of questions in broken English and answers to the effect of ‘is it plugged in? did you turn it on? try calling the support line’.
VTune is unique among profilers in that it has two ways of profiling. What it calls ‘call graph profiling’ is the typical profiler functionality, which instruments all your code, makes it run 100 times slower, then when it’s done running, shows you the complete call graph with time spent in each function, helping you see where the slow spots in your app are.
VTune’s other profiling solution, and that which sets it apart, is based on taking snapshots of the processor state based on triggers like n instructions retired. In each snapshot, VTune notes where execution is at that instant. This snapshot approach doesn’t require instrumenting code, and it doesn’t slow it down that much, but there is one HUGE downside: no call graph. It can tell you your app spent all its time in malloc, but it can’t tell you who called the mallocs that it spent all its time in.
Not surprisingly, Intel extols this snapshot-based profiling as though it’s actually usable, but I’ve never run into a situation in which I didn’t end up using the call graph profiling to get the info I want. This would just be an annoyance, except call graph profiling crashes almost every time I try to use it.
Back to my current problem, I was using call graph profiling, and sure enough, the app I was profiling would crash on startup. Intel says this happens if you’re using modules that aren’t relocatable, but it’s also supposed to tell you which modules aren’t relocatable. It wasn’t, then I fiddled around with some settings and suddenly it complained that about half the DLLs I was profiling weren’t relocatable. I removed them from the list of DLLs to keep track of, and I was off.
I ran my performance test that processes a capture file with 36k packets in it, and got the results. They were surprising to say the least.
According to vtune, my whole test run for 22 million msc, whatever that is (not milliseconds; the app took way less than 6 hours to run. Nor microseconds; it took more than 22 seconds to run; whatever). Of those 22M, 11M were spent in either malloc or free. I’m not calling either of those directly, and in fact the biggest offender in terms of calling malloc and free really calls but one method: rb_class_new_instance.
My takeaway from this is that object creation in Ruby is expensive enough that creating three million objects (my rough count; one for each packet, and one for each field within each packet) is slow. This rather confirms my suspicions that I should create one Ruby object for the packet, and wrap it around a C++ associative container to store the fields. Since object creation in C++ is pretty fast (and lightweight), this should improve performance quite a bit.
Ruby Extension on Windows Hell – The Next Chaper
In the previous episode, I was struggling with the Ruby extension build environment on Windows. I finally gave up and created a Visual C++ 2005 project that built the extension, and wrote a post-build step to copy the files into the Ruby install directory. Obviously this is a short-term hack; I’ll need to get something that will build on *NIX, but I don’t want to spend any more time on the fucking build environment right now.
The next clusterfuck was not unlike the previous one. See, I’m wrapping some Wireshark functionality in a Ruby extension so I can extract dissected packets from a capture file as Ruby objects for my own nefarious purposes. To make this work, I take a Wireshark source tree, build it, then link to some of the resulting Wireshark DLLs in order to implement packet dissection. I built Wireshark with VC2k5 because I refuse to build anything with Visual C++ 6.0.
Unfortunately, I’m using the Ruby one-click installer for Windows, which is built with Visual C++ 6.0 (don’t even get me started!) This presents a problem, as the following dependency chain will (hopefully) illustrate:
my stuff => wiretap.dll => zlib1.dll
ruby => zlib.so => zlib1.dll
Now, my stuff, wiretap.dll, and the zlib1.dll that wiretap.dll depends on were all built with VC2k5, and use its runtimes. Ruby, zlib.so, and the zlib1.dll that zlib.so depends on where all built with VC6 and use its runtimes. Do you see the problem yet?
My stuff runs within Ruby’s process as a DLL, which means that the zlib1.dll that I want and the zlib1.dll that Ruby wants can’t both be loaded; it’s one or the other. I was running Ruby’s version, but either one has the same problem as illustrated below.
Here’s a bit of code from a file within wiretap.dll, snipped and macros expanded for brevity:
wth->fd = _wopen(filename, O_RDONLY|O_BINARY, 0000);
wth->fh = gzdopen(wth->fd, "rb");
Here’s what’s happening: The C runtime function _wopen is being called to open a file, and it returns a file descriptor, which is just an integer that identifies that open file. Then the zlib1 function gzdopen is called, passing in the file descriptor that it is to operate on. Remember, wiretap.dll is linked against the VC2k5 runtimes, so the integer returned by _wopen identifies the file to the vc2k5 runtime library functions. Then, this descriptor is passed into a zlib1 function, which is linked against the vc6 runtime, which keeps a separate list of file descriptors. Depending on what’s happened up to this point, the FD returned by _wopen might be a valid FD to vc6′s runtimes, but it certainly won’t refer to filename as we expect. Thus, gzdopen fails strangely.
Sure, I could use the zlib1 that is built with vc2k5 and get around this problem, but what happens when ruby, built with vc6, passes one of its FDs to zlib1? That’s right, the same damn thing.
What to do? If I controlled all the sources involved, I’d just build them all with vc2k5 and be done with it. Alas, I do not. Another option is to capitulate and use VC6, however I’d sooner port my code to Visual Basic than take that giant leap backward. I could build Wireshark without the HAVE_LIBZ define, but that would require users do a custom build of Wireshark, plus it removes compression functionality. I could build Ruby with VC2k5 (assuming it even has an option for that), but then the one-click installer version of Ruby won’t work.
To be honest, I don’t know the solution yet. I just know if I meet the Ruby build engineer who thought it would be a good idea to use an ancient compiler to build the latest Ruby, one of us will be walking away with a brutal wedgie.
The Totally Bullshit Ruby Extension Experience on Windows
In my quest to wrap Wireshark to dissect packet captures into something Ruby can handle, I’ve eliminated the PDML export option, and am now trying to write a Ruby C extension to wrap the Wireshark libraries.
As a Windows user of open source software, I’m used to being a second class citizen, worth little more than some gruding attention from a UNIX programmer who can barely be troubled to scrape together a shitty nmake makefile for a twelve year old version of Visual C++. It’s understood that without installing Cygwin and accepting alot of UNIX-on-Windows kludginess, you’ve basically a snowball’s chance in hell of building any moderately complex open source tool under Windows.
I finally managed to build Wireshark from sources using the instructions on the Wireshark dev site, but that’s only because Wireshark is one of the most Windows-friendly open source projects in the history of mankind. Ruby extensions, on the other hand, exemplify perfectly the my-way-or-the-highway conceit of the UNIX developer community. Allow me to explain:
If all you know about Ruby extensions you know from reading the Dave Thomas book Programming Ruby, then you can’t possibly understand what I’m saying. You know that building a Ruby extension in C is as easy as whipping up a few lines of extconf.rb, running ruby to generate the Makefile, then make and make install. Like so much else in Ruby, it Just Works.
Uh, yeah. If you’re running an operating system that the Ruby digerati have blessed as worthy, that’s probably how it works. Certainly under FreeBSD, most Linux distros, and Mac OS X, it’s that easy. I can only assume that us Windows users are such fuckwit n00bs we deserve to suffer as second class citizens; it’s what we get for betraying Le Resistance.
You see, in most UNIX environments, you can make a few simple assumptions. You can assume the system has a C compiler installed, that there’s only one compiler that could be used to build system binaries, that make and cc are in the path, and that you can let the compiler more or less decide what runtime libraries to link with, knowing that whatever it chooses will run anywhere on the machine. You can further assume that you’re building binaries for your machine alone, since everyone knows how to do builds so why bother with binary tarballs?
Under Windows, each of these assumptions is wrong. Windows doesn’t ship with a compiler, and even if one is installed, there are several flavors of Visual C++ ranging from Visual C++ 6.0 circa 1999 through Visual C++ 2005 SP1 circa 2006. Though each version is similar, most have different runtime libraries that do not interoperate, and later versions have a bizarre manifest system whereby runtime libraries are linked at runtime. If you write an executable to use Compiler Platform A, and link with a DLL built with Compiler Platform B, any attempts to use runtime objects like FILE pointers or memory management functions between the EXE and the DLL will end in tears.
This brings me back to the Ruby extension mechanism. The idea is simple enough: use a Ruby script to describe the headers, libraries, and various other bits an extension takes as input, and let Ruby itself generate a makefile custom tailored to your environment. This sucks for a couple of reasons: compiler version and runtime library.
It seems the one-click installer of Ruby 1.8.4 for Windows was built using Visual C++ 6.0, since the config.h file in c:\ruby\lib\ruby\1.8\i386-mswin32\config.h has this little nugget:
#if _MSC_VER != 1200
#error MSC version unmatch
#endif
For those of you born after 1990, 1200 is the value for _MSC_VER used to indicate code is being compiled under Visual C++ 6.0. So, right off the bat, any code that #includes config.h (that is to say, all Ruby extensions) will fail loudly when built with anything other than Visual C++ 6.0. Why, you might ask yourself, would the developer responsible for this code limit it to the oldest version of Visual C++ still in use today? I’ve nfi, especially since C/C++ purists have no better friend on Windows than Visual C++ 2005, and no worse enemy than Visual C++ 6.0. But, be that as it may, the decision was made, and we suffer for it.
You might be wondering ‘what would happen if I remove that code and build something with Visual C++ 2005?’. Great question; it turns out it can be made to work. First you must comment out those lines in config.h. After you do that, and use ruby extconf.rb to generate a Makefile for your extension, you have to run nmake (Microsoft’s version of make). You get this not-at-all-reassuring output:
Microsoft (R) Program Maintenance Utility Version 8.00.50727.762
Copyright (C) Microsoft Corporation. All rights reserved.
cl -nologo -I. -Ic:/ruby/lib/ruby/1.8/i386-mswin32 -Ic:/ruby/lib/ruby/1.8/i386-mswin32 -I. -MD -Zi -O2b2xg- -G6 -c -Tcrcapdissector.c
cl : Command line warning D9035 : option 'Og-' has been deprecated and will be removed in a future release
cl : Command line warning D9002 : ignoring unknown option '-G6'
rcapdissector.c
cl -nologo -LD -Fercapdissector.so rcapdissector.obj msvcrt-ruby18.lib oldnames.lib user32.lib advapi32.lib ws2_32.lib -link -incremental:no -debug -opt:ref -opt:icf -dll -libpath:"c:/ruby/lib" -def:rcapdissector-i386-mswin32.def -implib:rcapdissector-i386-mswin32.lib -pdb:rcapdissector-i386-mswin32.pdb
Creating library rcapdissector-i386-mswin32.lib and object rcapdissector-i386-mswin32.exp
I like to think most competent programmers are conditioned to feel aversion and discontent when a build generates warnings, particularly warnings like “‘-G6′? What the fuck does that mean? Screw it, I’m ignoring it” and “Uh, nobody uses ‘Og-’ anymore, what the hell is wrong with you?”. Apparently, I’m a naive little fuckwit, because you’re just expected to suck up those warnings if you want to build Ruby extensions under Visual C++ 2005.
Think you’re done yet? Fuck no. If you try to pull a nmake install now, it’ll look like it worked:
Microsoft (R) Program Maintenance Utility Version 8.00.50727.762
Copyright (C) Microsoft Corporation. All rights reserved.
install -c -p -m 0755 rcapdissector.so c:\ruby\lib\ruby\site_ruby\1.8\i386-msvcrt\rcapdissector
install -c -p -m 0644 .\lib\field.rb c:\ruby\lib\ruby\site_ruby\1.8\rcapdissector
install -c -p -m 0644 .\lib\packet.rb c:\ruby\lib\ruby\site_ruby\1.8\rcapdissector
install -c -p -m 0644 .\lib\packet_element.rb c:\ruby\lib\ruby\site_ruby\1.8\rcapdissector
install -c -p -m 0644 .\lib\protocol.rb c:\ruby\lib\ruby\site_ruby\1.8\rcapdissector
Hooray! But wait. Let’s try it out with irb first:
irb
irb(main):001:0> require 'rcapdissector/rcapdissector'
LoadError: 126: The specified module could not be found. - c:/ruby/lib/ruby/site_ruby/1.8/i386-msvcrt/rcapdissector/rcapdissector.so
from c:/ruby/lib/ruby/site_ruby/1.8/i386-msvcrt/rcapdissector/rcapdissector.so
from c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require'
from (irb):1
Between the require and the LoadError is a stern message box saying:
---------------------------
ruby.exe - Unable To Locate Component
---------------------------
This application has failed to start because MSVCR80.dll was not found. Re-installing the application may fix this problem.
---------------------------
OK
---------------------------
Bollox! How can it be missing msvcr80.dll? That’s the Visual C++ 2005 runtime DLL, which is installed when Visual C++ is.
If the extent of your Visual C++ 2005 experience is with using the IDE to build stuff, you probably have no fucking idea what’s wrong. If, like me, you’ve had to deal with nmake makefiles written by old make curmudgeons who refuse to use a GUI, then you totally feel my pain.
You see, Microsoft thought it was bad that it was so easy to install the wrong version of a DLL and break a bunch of apps (so-called DLL Hell). To fix it, they borrowed an idea from the .NET world; dynamic linking based on a cryptographic hash of a file, not just a file name. This way, you can have multiple versions of fuckit.dll installed side-by-side, and apps automatically load the right one. Brilliant. We’ll call it…side-by-side installation. Brilliant.
The catch is that apps that link to DLLs need to have some additional metadata descriptors to point to these side-by-side DLLs. This metadata is called a manifest, and it’s embedded in each executable as a resource. When you do a build with the Visual C++ IDE, it generates a manifest to link the C++ runtime DLL, and embeds it in the executable automatically. When you build something with nmake, well, you’re on your own.
The Ruby extension build tools don’t know/care about this, because they’re written by a guy who sees no problem building software in 2007 using a compiler that was EOLed last century. You, however, do care about this, because you insist on building Ruby extensions with a compiler that was released after you were born. So, after you do your little nmake, and before nmake install, you need to do this:
mt.exe -manifest rcapdissector.so.manifest -outputresource:rcapdissector.so;2
(Ignore the .so extension; the Ruby developer responsible for the Ruby extension build system is a cultural imperialist who seems intent upon forcing his vision of the world upon users of other operation systems, like for example operating systems that use .dll to denote dynamically linked libraries).
Now, do a nmake install and try the shit out in irb:
irb
irb(main):001:0> require 'rcapdissector/rcapdissector'
=> true
irb(main):002:0> shitoutofluck=CapDissector::CapFile.new('like i give a fuck')
=> #<CapDissector::CapFile:0x2e7ca50>
irb(main):003:0>
Fuckin’ A. There you have it. How’s that for ‘Just Works’.
NOTE: I’d probably still be scratching my head and kicking my cat over this if it weren’t for Al Hoang’s two posts on the subject. Thanks Al.