apocryph.org Notes to my future self

30Jun/074

May have identified FiOS router slowdown problem

Last time, I noted how my Verizon-supplied ActionTec router seemed to flake out after a week or so of heavy use, such that its DNS requests started to fail. I switched my internal router to use OpenDNS instead of the router’s own internal DNS, thinking that would solve the problem.

Perhaps not surprisingly, it didn’t. However, when I awoke this morning to find my router performance sucking again, this time I poked around the logs on the ActionTek router a bit more. I ran across this gem in the security log:

Jun 30 11:48:52 2007    Firewall Error  Firewall internal   NAT Error : connection pool is full. No connection created

Aha. That does make sense. If the NAT table is full, new connections will come at the expense of older ones.

So, how can I increase the size of the NAT table, or somehow otherwise resolve this issue?

Well, poking around the GUI I see no options to control the size of the NAT table, so I’ll have to find a way to get it to not use a dynamic NAT table. Fortunately the router has a ‘Static NAT’ option, which allows you to configure an IP on the internal LAN, an IP on the external WAN, and instruct the NAT subsystem to map ports directly from one to the other, avoiding the need for a NAT table.

Sadly, this option doesn’t let me specify ‘whatever the current WAN IP address is’, so depending upon how often Verizon expires my public IP, I may have to fiddle with this setting. I knew FiOS was too good to be true.

UPDATE: Thanks to a pointer from Christian in the comments, I got the Actiontek router into bridge mode, and my NAT problems are over.

29Jun/070

Took my new FAL to the range last night

Last night I made time to take my new FAL to the range. The range was really hot and my glasses kept fogging over, which is the excuse I will choose for my poor accuracy.

Initially my shots were grouping under 2″ at 25 meters, but way low from the point of aim. I had to turn the front sight three full turns (clockwise, which is opposite the AR-15 adjustment direction for raising the point of impact) but finally started hitting roughly the point of aim. My accuracy was still pretty bad, with a 3″ group at 25 meters firing from standing, and 2″ from benchrest, but the shooting conditions were far from ideal. Now that I’ve got it zeroed I’ll go back again and see how it performs.

I put 100 rounds through it in about 40 minutes, and man did the barrel get hot. By the last magazine, the fore grip was too hot to touch, and I had to hold the gun by the mag well. Hopefully six weeks from now DSA will finish the quad rail handguard, so I can attach a vertical fore grip and avoid the whole mess.

Finally I would just note that I never understood before the M1A and Garand guys who looked upon the AR-15 with such disdain, calling them ‘poodle-shooters’ and such. I still think you have to be a real prick fanboy to make fun of guns other than the ones you like, but I do finally understand the feeling of superiority that comes from firing a heavier cartridge. There’s something about knowing your gun turns cinder blocks and huge tree stumps from cover to concealment that makes you want to ridicule the guy next to you shooting an AR-15.

29Jun/070

CHP Denial Appeal Denied on a technicality

I had my Ore Tenus hearing today in Fairfax County Circuit Court appealing the county’s denial of my concealed handgun permit on the basis of my refusal to obey their extra-legal requirements for documented proof of residency. The judge was in a dreadful mood, and was not at all swayed by my attorney’s argument, but the judge’s opinion on the matter turned out to be moot, as the county clerk didn’t enter my request for an Ore Tenus hearing until 36 days after the original denial was entered, while the law provides only a 35 day window in which to appeal.

Apparently, I was naive in assuming that ‘entered’ meant ‘received in the mail’, when in fact it means ‘received in the mail, left to sit in a pile for days on end, and finally typed into a computer somewhere’. I posted the request for the hearing at the 30 day mark, figuring five days would be plenty of time for a letter that needed to travel a total of 20 miles from Herndon to Fairfax.

So, now I have to start again, submit another application, wait for another denial, schedule another Ore Tenus hearing, etc. Hopefully next time I get a judge who’s in a better mood. I can only imagine how much it would suck to be in court on something serious like a major civil or criminal matter only to have one’s fate decided by the fickle emotional state of an all-powerful judge.

Anyway, we’re now at least 45 days away from the next permit denial. I’ll close with a distasteful paraphrase of John Quincy Adams: Posterity, you will never know how much it cost me to secure your freedom from providing proof of residency on your concealed handgun permit application. I hope you will make good use of it.

28Jun/072

Tip: Engenius EUB362-EXT doesn't work with OpenBSD 4.1

For reasons I wrote extensively about then lost when I accidentally navigated away from my blog posting form, I’m trying to get a USB wlan adapter going with an OpenBSD VM running kismet. I thought the Engenius EUB-362 EXT with its Atheros USB chipset would be just the ticket; after all, the new [uath(4)](http://www.openbsd.org/cgi-bin/man.cgi?query=uath&sektion=4) driver says it supports such chipsets, and the EUB-362 has an RP-SMA connector and 200mw of transmit power!

Sadly, it doesn’t work. Badly. kismet fails to start with Cannot set ifmedia: Device not configured because the SIOCSIFMEDIA ioctl fails for the device. If I modify pcapsource.cc line 2876 so kismet ignores that failure, my efforts are rewarded with a kernel panic in the uath driver. Upgrading to the -current branch as of 28 June 2007 didn’t help.

I should’ve headed the admonitions about the uath driver being a work in progress, but I really wanted that 200mw transmit power. Stupid me.

I’ve now pinned my hopes on the Alfa AWUS036S, which is based on the more stable Ralink rt2500 chipset. I run three rt2500-based PCI cards in an OpenBSD box at home, and while it does panic from time to time, it’s usually good for a day or so at least. What’s more, I have proof that the AWUS036S works with aircrack-ng on Linux, so even if the ural OpenBSD driver is unstable, I can switch to Ubuntu.

Sure, I could post a bug to the bugs list and go back and forth, but it’s not worth it to me.

27Jun/070

My FAL is here at last

After thirteen long weeks of waiting, my FAL from DSA is finally here! I ordered it back in March, and it’ll be another six weeks before the rail handguards and scope mount are ready, but the gun itself is waiting at my FFL dealer’s house for me to pick it up later tonight.

I ordered the SA-58 Para Carbine with the Type II receiver, but I opted for an 18″ barrel at no extra charge, coz let’s face it, if you want a 16″ bbl carbine, why the HELL are you looking at a FAL?

I’ll post pics and a range report later on.

23Jun/070

More trouble with FiOS Router

The problem I reported earlier wherein my Verizon-supplied router stops responding to DNS requests has come up again, only this time it’s intermittent failure, not an outage. A couple days ago my roommate bitched that no DNS queries were resolving; at the time I wrote it off as end-user delusion, but I am beginning to suspect it was another instance of this router problem.

My hope is that the router’s DNS server is just flaky and/or doesn’t run well under heavy load, so I’ve adjusted my OpenBSD router’s BIND configuration to forward DNS requests to OpenDNS instead of the FiOS router. Hopefully that fixes it; I’d hate to have to engage Verizon’s dreaded Endless Loop Support Process(tm).

23Jun/070

Back from the abyss: Ruby extension performance tuning on Windows

It’s hard to believe the previous episode was less than 24 hours ago. So much has changed.

As promised, I abandoned the wind-sucking everything-is-an-object approach, in favor of something more lightweight. Each packet is still its own object, however instead of creating a Ruby object graph for the protocol tree right away, I now keep the lightweight C protocol objects in an STL multimap, and only construct Ruby Field objects for fields as they are requested. I take it one step further, and within each Field object I don’t create the Ruby wrappers for the name, value, display name, and display value until each is first referenced.

As a result, my previous test, which was simply to run through all the packets as fast as possible, runs in 12 seconds. That’s around 10 seconds of pure Wireshark overhead, and a couple seconds for 36k Ruby objects, and building the multimap of an average of 83 fields, 36k times. I’m pretty pleased with that.

But it gets better. If within each packet I call each_field, which forces creation of a Ruby Field object for each field in each packet, runtime is still only 12.9 seconds. If I add a reference to each field’s name, forcing the creation of a Ruby String for each field, runtime is < 15 seconds. That’s for a total of roughly 3 million fields across those 36k packets, so considering the level of shit performance I was seeing before, I think that’s pretty impressive. It’s an order of magnitude improvement over the Expat XML solution, and is only a few seconds slower than the minimum runtime imposed by Wireshark.

Next up: explore Wireshark’s ability to limit the ‘columns’ it dissects, to further increase performance.

I would like to take a moment to note some surprises I encountered with Ruby’s garbage collector. I (mistakenly) treated it like the .NET garbage collector, which it turns out is so much more sophisticated than Ruby’s that it’s hard to believe Ruby’s works at all.

First, Ruby’s GC is not asynchronous; on other words, it’s not possible that my extension could be in the middle of a C function, and the GC starts in another thread and collects an object I was working on.

Second, Ruby’s GC is a mark-and-sweep, wherein all root level objects, and all objects in the current scope, are ‘marked’, then they mark all objects they reference, and so on. Once the entire object graph is explored, Ruby sweeps any objects that aren’t marked. Easy enough.

The problem is that extension developers need to be aware of this if they are to hold any VALUE references within a native data structure. As it happens I had to do this in a couple places. The trick is to give Ruby a method to call when your object is marked, and in this method call rb_gc_mark on any VALUE references you’re holding on to. The mark argument to Data_Wrap_Struct is just such a function pointer. It’s passed in the void* you passed to Data_Wrap_Struct, which you obviously cast appropriately. Once you get that figured out, there’s really nothing to it.

23Jun/070

Building Ruby on Windows, and performance

Last time, I encountered horrifying performance with my Ruby extension, and had two action items:

  • Build Ruby from sources so I’d have debug information
  • Profile my extension using Intel VTune

I was actually shocked how easy it was to build Ruby from sources. Under windows it’s literally just:

 win32\configure
 nmake
 nmake test
 nmake DESTDIR=foo install

Seriously. I did have to change win32\Makefile.sub to add /fixed:no to the linker command, since VTune won’t work with modules that are not relocatable, but other than that it was a no-brainer. All this makes me wonder why the official Windows builds of Ruby aren’t built with VC2k5 when it’s so superior. In fact, my test ran on the VC2k5 version of Ruby nearly twenty seconds faster, 75 seconds instead of 94!

Anyway, with that done I adjusted my VC2k5 extension project to copy the DLLs to the new ruby path, and got underway.

Let me now digress for a moment and point out just how horrifyingly bad Intel VTune is. I use VTune at work for performance-tuning our server apps, and whenever I encounter a performance problem I exhaust all other alternatives before I bring VTune to bear; it’s that bad.

First off, one gets the feeling that, despite being in version 9.x, VTune is written and maintained by interns. It’s GUI is clunky, its installer is temperamental, it crashes for no discernible reason, it won’t run at all without admin privs (seriously, not at all; won’t even start) and the support board is full of questions in broken English and answers to the effect of ‘is it plugged in? did you turn it on? try calling the support line’.

VTune is unique among profilers in that it has two ways of profiling. What it calls ‘call graph profiling’ is the typical profiler functionality, which instruments all your code, makes it run 100 times slower, then when it’s done running, shows you the complete call graph with time spent in each function, helping you see where the slow spots in your app are.

VTune’s other profiling solution, and that which sets it apart, is based on taking snapshots of the processor state based on triggers like n instructions retired. In each snapshot, VTune notes where execution is at that instant. This snapshot approach doesn’t require instrumenting code, and it doesn’t slow it down that much, but there is one HUGE downside: no call graph. It can tell you your app spent all its time in malloc, but it can’t tell you who called the mallocs that it spent all its time in.

Not surprisingly, Intel extols this snapshot-based profiling as though it’s actually usable, but I’ve never run into a situation in which I didn’t end up using the call graph profiling to get the info I want. This would just be an annoyance, except call graph profiling crashes almost every time I try to use it.

Back to my current problem, I was using call graph profiling, and sure enough, the app I was profiling would crash on startup. Intel says this happens if you’re using modules that aren’t relocatable, but it’s also supposed to tell you which modules aren’t relocatable. It wasn’t, then I fiddled around with some settings and suddenly it complained that about half the DLLs I was profiling weren’t relocatable. I removed them from the list of DLLs to keep track of, and I was off.

I ran my performance test that processes a capture file with 36k packets in it, and got the results. They were surprising to say the least.

According to vtune, my whole test run for 22 million msc, whatever that is (not milliseconds; the app took way less than 6 hours to run. Nor microseconds; it took more than 22 seconds to run; whatever). Of those 22M, 11M were spent in either malloc or free. I’m not calling either of those directly, and in fact the biggest offender in terms of calling malloc and free really calls but one method: rb_class_new_instance.

My takeaway from this is that object creation in Ruby is expensive enough that creating three million objects (my rough count; one for each packet, and one for each field within each packet) is slow. This rather confirms my suspicions that I should create one Ruby object for the packet, and wrap it around a C++ associative container to store the fields. Since object creation in C++ is pretty fast (and lightweight), this should improve performance quite a bit.

22Jun/070

Abandon all hope ye who enter here, or: Writing Ruby extensions for Windows

Through diligence and ingenuity, I overcame the DLL Hell problem from the last episode, and am now moving into a new circle of hell.

Recall that I encountered a conundrum, in which ruby wanted to use the VC6 version of zlib1, and wireshark wanted to use the VC2k5 version, and they were both in the same process. The fix, ironically, was to create a side-by-side private assembly consisting of all of the Wireshark DLLs upon which my extension depends, and put the entire assembly (contained within a folder consisting of the DLLs themselves and a special XML manifest file) in the same directory as the capdissector.so extension file. I then added a manifest fragment to my VC++ project that built the extension, to specify a dependency on my new wireshark assembly. The Windows loader took care of the rest.

This is a particularly delicious solution because it isolates all the Wireshark DLLs to my extension only; there’s no risk of any of the DLLs (many of which are open source staples like pcre and iconv) interfering with any other Ruby extensions, since only through the use of the special manifest directive can the wireshark assembly be loaded, and we know from experience that open source build engineers will do whatever they must, including use ten year old compilers, to avoid side-by-side drama.

However, now I’ve encountered a better-known Ruby bugaboo: performance.

You may recall I had previously attempted to avoid coding to wireshark directly, and instead used tshark to output a capture file in PDML, which I would then parse with Ruby’s XML parser. The devastating performance consequences are chronicled here, but the gist is:

 NullParser ran against normal_dump for 3.234; got 0 packets
 NullParser ran against huge_dump for 67.36; got 0 packets
 ExpatParser ran against normal_dump for 5.484; got 1574 packets
 ExpatParser ran against huge_dump for 124.891; got 36158 packets
 REXMLParser ran against normal_dump for 28.188; got 1574 packets
 REXMLParser ran against huge_dump for 640.875; got 36158 packets

Meaning tshark itself took 67 seconds to generate PDML for 36k packets, Ruby’s wrapper around expat took 124 seconds, and REXML, that Nazgul of Ruby performance, took 640 seconds. Surely, coding directly to Wireshark APIs and not running all that PDML to stdout should result in performance way below 67 seconds, right?

Initially, it seemed so. I wrote a test derived from the tshark code but without any console IO, which opens the capture file, decodes all the packets, and walks the protocol tree, but does not print anything out. This is the theoretical best-case performance. That tool comes in at 10 seconds on my laptop. That’s 3600 packets/second; damn good.

Sadly, when I ran my unit test that loads the same capture file with my Ruby extension and walks the protocol tree, I got 95 seconds. Yes, that’s right, worse than tshark printing PDML, and almost as bad as expat! Now I know what all those mailing list performance gripes were about.

I can’t really profile this using the one-click installer version of Ruby, since it doesn’t come with symbols so my profiling tools will just say ‘ruby.exe sucks wind’; not helpful. Now I have to fetch the source tree, build it myself (with symbols retained) and point vtune at it. I suspect I’ll have to give up my initial design of building a Ruby object for every protocol, and every field within every protocol, and instead write a lightweight Ruby wrapper around a high-performance C++ associative container. Yay.

In the next episode, building Ruby with Visual C++ 2005, or ‘Goddammit what the FUCK were you thinking!?’

21Jun/071

Ruby Extension on Windows Hell – The Next Chaper

In the previous episode, I was struggling with the Ruby extension build environment on Windows. I finally gave up and created a Visual C++ 2005 project that built the extension, and wrote a post-build step to copy the files into the Ruby install directory. Obviously this is a short-term hack; I’ll need to get something that will build on *NIX, but I don’t want to spend any more time on the fucking build environment right now.

The next clusterfuck was not unlike the previous one. See, I’m wrapping some Wireshark functionality in a Ruby extension so I can extract dissected packets from a capture file as Ruby objects for my own nefarious purposes. To make this work, I take a Wireshark source tree, build it, then link to some of the resulting Wireshark DLLs in order to implement packet dissection. I built Wireshark with VC2k5 because I refuse to build anything with Visual C++ 6.0.

Unfortunately, I’m using the Ruby one-click installer for Windows, which is built with Visual C++ 6.0 (don’t even get me started!) This presents a problem, as the following dependency chain will (hopefully) illustrate:

 my stuff => wiretap.dll => zlib1.dll
 ruby => zlib.so => zlib1.dll

Now, my stuff, wiretap.dll, and the zlib1.dll that wiretap.dll depends on were all built with VC2k5, and use its runtimes. Ruby, zlib.so, and the zlib1.dll that zlib.so depends on where all built with VC6 and use its runtimes. Do you see the problem yet?

My stuff runs within Ruby’s process as a DLL, which means that the zlib1.dll that I want and the zlib1.dll that Ruby wants can’t both be loaded; it’s one or the other. I was running Ruby’s version, but either one has the same problem as illustrated below.

Here’s a bit of code from a file within wiretap.dll, snipped and macros expanded for brevity:

 wth->fd = _wopen(filename, O_RDONLY|O_BINARY, 0000);
 wth->fh = gzdopen(wth->fd, "rb");

Here’s what’s happening: The C runtime function _wopen is being called to open a file, and it returns a file descriptor, which is just an integer that identifies that open file. Then the zlib1 function gzdopen is called, passing in the file descriptor that it is to operate on. Remember, wiretap.dll is linked against the VC2k5 runtimes, so the integer returned by _wopen identifies the file to the vc2k5 runtime library functions. Then, this descriptor is passed into a zlib1 function, which is linked against the vc6 runtime, which keeps a separate list of file descriptors. Depending on what’s happened up to this point, the FD returned by _wopen might be a valid FD to vc6′s runtimes, but it certainly won’t refer to filename as we expect. Thus, gzdopen fails strangely.

Sure, I could use the zlib1 that is built with vc2k5 and get around this problem, but what happens when ruby, built with vc6, passes one of its FDs to zlib1? That’s right, the same damn thing.

What to do? If I controlled all the sources involved, I’d just build them all with vc2k5 and be done with it. Alas, I do not. Another option is to capitulate and use VC6, however I’d sooner port my code to Visual Basic than take that giant leap backward. I could build Wireshark without the HAVE_LIBZ define, but that would require users do a custom build of Wireshark, plus it removes compression functionality. I could build Ruby with VC2k5 (assuming it even has an option for that), but then the one-click installer version of Ruby won’t work.

To be honest, I don’t know the solution yet. I just know if I meet the Ruby build engineer who thought it would be a good idea to use an ancient compiler to build the latest Ruby, one of us will be walking away with a brutal wedgie.

Delicious Bookmarks

Recent Posts

Meta

Current Location