Abandon all hope ye who enter here, or: Writing Ruby extensions for Windows

Through diligence and ingenuity, I overcame the DLL Hell problem from the last episode, and am now moving into a new circle of hell.

Recall that I encountered a conundrum, in which ruby wanted to use the VC6 version of zlib1, and wireshark wanted to use the VC2k5 version, and they were both in the same process. The fix, ironically, was to create a side-by-side private assembly consisting of all of the Wireshark DLLs upon which my extension depends, and put the entire assembly (contained within a folder consisting of the DLLs themselves and a special XML manifest file) in the same directory as the capdissector.so extension file. I then added a manifest fragment to my VC++ project that built the extension, to specify a dependency on my new wireshark assembly. The Windows loader took care of the rest.

This is a particularly delicious solution because it isolates all the Wireshark DLLs to my extension only; there’s no risk of any of the DLLs (many of which are open source staples like pcre and iconv) interfering with any other Ruby extensions, since only through the use of the special manifest directive can the wireshark assembly be loaded, and we know from experience that open source build engineers will do whatever they must, including use ten year old compilers, to avoid side-by-side drama.

However, now I’ve encountered a better-known Ruby bugaboo: performance.

You may recall I had previously attempted to avoid coding to wireshark directly, and instead used tshark to output a capture file in PDML, which I would then parse with Ruby’s XML parser. The devastating performance consequences are chronicled here, but the gist is:

 NullParser ran against normal_dump for 3.234; got 0 packets
 NullParser ran against huge_dump for 67.36; got 0 packets
 ExpatParser ran against normal_dump for 5.484; got 1574 packets
 ExpatParser ran against huge_dump for 124.891; got 36158 packets
 REXMLParser ran against normal_dump for 28.188; got 1574 packets
 REXMLParser ran against huge_dump for 640.875; got 36158 packets

Meaning tshark itself took 67 seconds to generate PDML for 36k packets, Ruby’s wrapper around expat took 124 seconds, and REXML, that Nazgul of Ruby performance, took 640 seconds. Surely, coding directly to Wireshark APIs and not running all that PDML to stdout should result in performance way below 67 seconds, right?

Initially, it seemed so. I wrote a test derived from the tshark code but without any console IO, which opens the capture file, decodes all the packets, and walks the protocol tree, but does not print anything out. This is the theoretical best-case performance. That tool comes in at 10 seconds on my laptop. That’s 3600 packets/second; damn good.

Sadly, when I ran my unit test that loads the same capture file with my Ruby extension and walks the protocol tree, I got 95 seconds. Yes, that’s right, worse than tshark printing PDML, and almost as bad as expat! Now I know what all those mailing list performance gripes were about.

I can’t really profile this using the one-click installer version of Ruby, since it doesn’t come with symbols so my profiling tools will just say ‘ruby.exe sucks wind’; not helpful. Now I have to fetch the source tree, build it myself (with symbols retained) and point vtune at it. I suspect I’ll have to give up my initial design of building a Ruby object for every protocol, and every field within every protocol, and instead write a lightweight Ruby wrapper around a high-performance C++ associative container. Yay.

In the next episode, building Ruby with Visual C++ 2005, or ‘Goddammit what the FUCK were you thinking!?’

Tags: , , , ,

Leave a Reply