Back from the abyss: Ruby extension performance tuning on Windows

It’s hard to believe the previous episode was less than 24 hours ago. So much has changed.

As promised, I abandoned the wind-sucking everything-is-an-object approach, in favor of something more lightweight. Each packet is still its own object, however instead of creating a Ruby object graph for the protocol tree right away, I now keep the lightweight C protocol objects in an STL multimap, and only construct Ruby Field objects for fields as they are requested. I take it one step further, and within each Field object I don’t create the Ruby wrappers for the name, value, display name, and display value until each is first referenced.

As a result, my previous test, which was simply to run through all the packets as fast as possible, runs in 12 seconds. That’s around 10 seconds of pure Wireshark overhead, and a couple seconds for 36k Ruby objects, and building the multimap of an average of 83 fields, 36k times. I’m pretty pleased with that.

But it gets better. If within each packet I call each_field, which forces creation of a Ruby Field object for each field in each packet, runtime is still only 12.9 seconds. If I add a reference to each field’s name, forcing the creation of a Ruby String for each field, runtime is < 15 seconds. That’s for a total of roughly 3 million fields across those 36k packets, so considering the level of shit performance I was seeing before, I think that’s pretty impressive. It’s an order of magnitude improvement over the Expat XML solution, and is only a few seconds slower than the minimum runtime imposed by Wireshark.

Next up: explore Wireshark’s ability to limit the ‘columns’ it dissects, to further increase performance.

I would like to take a moment to note some surprises I encountered with Ruby’s garbage collector. I (mistakenly) treated it like the .NET garbage collector, which it turns out is so much more sophisticated than Ruby’s that it’s hard to believe Ruby’s works at all.

First, Ruby’s GC is not asynchronous; on other words, it’s not possible that my extension could be in the middle of a C function, and the GC starts in another thread and collects an object I was working on.

Second, Ruby’s GC is a mark-and-sweep, wherein all root level objects, and all objects in the current scope, are ‘marked’, then they mark all objects they reference, and so on. Once the entire object graph is explored, Ruby sweeps any objects that aren’t marked. Easy enough.

The problem is that extension developers need to be aware of this if they are to hold any VALUE references within a native data structure. As it happens I had to do this in a couple places. The trick is to give Ruby a method to call when your object is marked, and in this method call rb_gc_mark on any VALUE references you’re holding on to. The mark argument to Data_Wrap_Struct is just such a function pointer. It’s passed in the void* you passed to Data_Wrap_Struct, which you obviously cast appropriately. Once you get that figured out, there’s really nothing to it.

Tags: , , , ,

Leave a Reply