apocryph.org Notes to my future self

22Feb/094

named on OpenBSD sometimes logs ‘error sending response: not enough free resources’ under load

I recently repaved my OpenBSD firewall/router to upgrade to OpenBSD 4.4 and more importantly to load the OS and config files onto a CompactFlash drive, after I started noticing the telltale ‘clunk’ sound coming from its hard drive.  Not wanting to lose Internet access at an inopportune time, I switched to 4GB of cheap, solid-state storage.

However, during the reconfiguration I started to get alot of messages like this, particularly during heavy network loads:

Feb 21 23:54:33 boromir named[11546]: client 192.168.1.127#50805: error sending response: not enough free resources

I googled around, and noticed a number of people reporting this problem with named, on OpenBSD, FreeBSD, and some Linux flavors.  For me, I can make it happen by downloading a well-seeded BitTorrent and thereby saturating my network pipe.  Others also reported the issue being correlated with heavy network loads of one sort or another.

The usual suspects have already been eliminated.  Here’s what top says:

load averages: 0.08, 0.08, 0.08
28 processes: 27 idle, 1 on processor
CPU states: 0.2% user, 0.0% nice, 0.2% system, 22.9% interrupt, 76.8% idle
Memory: Real: 26M/55M act/tot Free: 95M Swap: 0K/516M used/tot

As you can see, it’s not simply a problem of low memory. I’ve got plenty of physical free, and haven’t even touched swap.

So, maybe mbufs, right?  No:

# netstat -m
105 mbufs in use:
97 mbufs allocated to data
2 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
96/376/6144 mbuf clusters in use (current/peak/max)
852 Kbytes allocated to network (25% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines

I also tried mucking about with some network-related sysctls. I found a list here that I tried (only the ‘net’ stuff), to no avail.

Then I pulled a copy of the source code for OpenBSD’s named implementation.  If you’re interested, its on any OpenBSD AnonCVS mirror under src/usr.sbin/bind. The WebCVS interface is here. Under bin/named in client.c, is this:

static void
client_senddone(isc_task_t *task, isc_event_t *event) {
	ns_client_t *client;
	isc_socketevent_t *sevent = (isc_socketevent_t *) event;

	REQUIRE(sevent != NULL);
	REQUIRE(sevent->ev_type == ISC_SOCKEVENT_SENDDONE);
	client = sevent->ev_arg;
	REQUIRE(NS_CLIENT_VALID(client));
	REQUIRE(task == client->task);
	REQUIRE(sevent == client->sendevent);

	UNUSED(task);

	CTRACE("senddone");

	if (sevent->result != ISC_R_SUCCESS)
		ns_client_log(client, NS_LOGCATEGORY_CLIENT,
			      NS_LOGMODULE_CLIENT, ISC_LOG_WARNING,
			      "error sending response: %s",
			      isc_result_totext(sevent->result));

	INSIST(client->nsends > 0);
	client->nsends--;

	if (client->tcpbuf != NULL) {
		INSIST(TCP_CLIENT(client));
		isc_mem_put(client->mctx, client->tcpbuf, TCP_BUFFER_SIZE);
		client->tcpbuf = NULL;
	}

	if (exit_check(client))
		return;

	ns_client_next(client, ISC_R_SUCCESS);
}

Note the part in bold. So this is where the “error sending response” bit comes from. I’m no expert in BIND code, but I’ve done a good bit of socket programming, and this routine appears to handle the asynchronous (or, in socket terms, ‘non-blocking’) completion of a send call, writing a response back to the DNS client. The send has failed, so it’s writing out this message to the log. However, what of the other part? What of not enough free resources?

Well, notice what is provided for the %s placeholder: the results of isc_result_totext(sevent->result). So isc_result_totext is getting some sort of error code and converting it into the “not enough free resources” message. But what code?

I then greped the whole bind tree for the text “not enough free resources”. I found this line in lib/isc/include/isc/result.h:

#define ISC_R_NORESOURCES 13 /*%< not enough free resources */

There’s also a corresponding result.c that implements the isc_result_totext function. So, what causes the ISC_R_NORESOURCES error?

I did some more grep work for that error code, and found lots of instances, mostly in lib/isc/unix/socket.c. Upon reviewing all the instances, it appears that error is almost always a result of a ENOBUFS errno from a socket operation.

So off we go to the send manpage. According to that, a return value of ENOBUFS denotes one of two things:

  • “The system was unable to allocate an internal buffer. The operation may succeed when buffers become available.”
  • “The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion.”

If ‘internal buffer’ means ‘mbuf’, then I doubt that’s the problem, as I’ve got plenty of room there.  It was the output queue that struck me.  This is happening during heavy load, when the internal network interface would be getting alot of traffic.  But what determines the size of its output queue, and how do you grow it?

I rummaged around alot on this, and I could not find an answer.  I looked for driver configuration options for the fxp driver, and found nothing.  So then I started poking around the source code for the fxp driver, and found this:

IFQ_SET_MAXLEN(&ifp->if_snd, FXP_NTXCB - 1);

FXP_NTXCB is defined in the header file, and is hard-coded to 128:

/*
 * Number of transmit control blocks. This determines the number
 * of transmit buffers that can be chained in the CB list.
 * This must be a power of two.
 */
#define FXP_NTXCB	128

It appears from this reading that the interface’s send queue is hard-coded. In order to lift this limit I would either have to do a custom kernel build, or find a network adapter with a larger and/or configurable send queue. That just doesn’t make sense, as OpenBSD isn’t supposed to be that lame. It’s entirely possible I’m misunderstanding the cause of the problem, especially since users have reported this under FreeBSD and Linux as well, but damned if I know what to do about it.

Ultimately this isn’t a huge issue. Apart from the aversion I have to a bunch of errors in my syslog, UDP in general and DNS in particular are designed to handle dropped responses by retransmitting the requests, but it does result in a perceptible lag during DNS resolution which I’d really like to fix.

26Nov/070

Upgraded to OpenBSD 4.2 today

This past weekend I upgraded my home firewall, wintermute, and one of my internal servers, aragorn, to OpenBSD 4.2. aragorn was running 4.1, and wintermute was kicking ass on 3.6!

wintermute is the first computer I ever owned; a Sony VAIO PCV-90. It’s a 90MHz Pentium with 64MB of RAM and an (upgraded) 3600 RPM 8GB PATA drive. aragorn is an ancient PowerEdge 1300 I bought for a contract many years ago; it’s a two-way Pentium II 400MHz box with something like 128MB of RAM and a couple of SCSI disks.

The upgrade went fine, though I did get tripped up by the fact that the ftp-proxy in 4.2 is a total rewrite from 3.6, so I had to adjust my pf.conf with three special anchors and enable the ftp-proxy service. I also screwed up during the disk labeling and created a swap partition that was 500 sectors, not 500MB, so I keep running out of swap space, doh!

I use aragorn to monitor my wireless network traffic using an EDIMAX PCI card with a Ralink rt63 chipset. Let me just say that the rt63 support in 4.2 is just as unreliable as it was in 4.1 and 4.0; kismet lasts between 20 minutes and a day before a kernel panic. I could get a card with decent support, but I don’t want to accept defeat just yet.

Overall the upgrade was painless. I strongly recommend it. It’s practically the last credible OS that’s actively hostile to those trying to use it. Now that my grandmother can get Ubuntu going, a retarded ten year old can figure out Slackware, and a booze-addled pop star can probably get FreeBSD to boot, OpenBSD is the only niche OS that a n00b lamer can’t get to work. Whenever some little shit starts to get a little too full of himself coz he’s so over Micro$loth and does all his 1337 hax0ring on Ubuntu Gutsy, I find an OpenBSD install CD and a dare are all that’s required to reduce him to tears. Puffy p0wnz Tux every time.

5Aug/070

Finally got FiOS Router into Bridge Mode

I’ve been lamenting about the too-small NAT table on my FiOS router for a while now. Fortunately, a comment posted to that article by ‘Christian’ pointed me to this article which walks through the process of converting the expensive, powerful, feature-rich Actiontek router into a dumb Ethernet-to-Coax bridge, which it exactly what I want.

I went through the steps, and had almost no problems. I suggest you backup your router configuration with the Save Configuration function before you start, and the article didn’t mention you need to disable DHCP on the router in order for Verizon to answer your DHCP request.

I did have to tweak my OpenBSD 3.7 router a bit, though.

First, I had configured it so its WAN interface had a static IP in the subnet of the Actiontek router. The router was 192.168.2.1 and my OBSD box was 192.168.2.2. After this change, the Actiontek box still has an IP for accessing it’s web-based admin tool, but it is no longer a router, so I had to configure my OBSD WAN port for DHCP.

First I ran dhclient xl0 to verify that it was able to get a DHCP lease from Verizon, and sure enough, it did.

Next I edited /etc/hostname.xl0 (where xl0 is the interface name of my WAN NIC), replacing all of the previous content with dhcp. After that I rebooted to see how it worked.

On the face of it, it worked fine, until I happened to read in the OBSD Handbook that dhclient rewrites your resolv.conf file with the DNS server info that comes back from the DHCP request. In this case, I trust Verizon’s DNS like I trust Mahmoud Ahmedinajad with a nuke, so whatever they’re selling I’m not interested. Unfortunately, dhclient had already clobbered my old resolv.conf file and replaced it with one pointing to Verizon’s DNS. Ick.

So I edited /etc/dhclient.conf per the handbook’s instructions, uncommenting the request lines and removing domain-name-servers from the list. (NB: I also removed domain-name since I didn’t think I wanted Verizon’s DHCP overriding my hostname, only to find that disabling that prevents the creation of a default route on the WAN link, which could definitely ruin someone’s day I forgot to delete /etc/mygate now that I’m on DHCP, so no default route was being created unless I ran dhclient after network startup). Then I rewrote /etc/resolv.conf to point to my local DNS server (which forwards to OpenDNS):

 nameserver 127.0.0.1
 domain ho.apocryph.org
 lookup file bind

Now everything is working fine, I’m not running out of NAT entries, and I’m completely off Verizon’s flaky DNS. Thanks for the tip, Christian.

UPDATE: Turns out the domain DHCP param had nothing to do with the missing default route. Update in place.

11Jul/072

Found: A USB WLAN adapter for OpenBSD wardriving

I’ve previously lamented the flakiness of the uath USB Atheros WLAN driver in OpenBSD 4.1. I’m happy to report that I’ve found a suitable alternative: the Ralink 2500-based Alfa AWUS036S.

It’s footprint is a dongle, but it comes with a threaded RP-SMA connector with a lame-ish 2dBi antenna that can be easily swapped out for that 25dBi monster you won’t admit to the FCC that you have. It’s also cheap; I got mine on eBay from Data Alliance, for a mere $36.

As per my requirements, I can boot up an OpenBSD 4.1 VMWare virtual machine, give the VM focus, insert the USB wlan adapter (it seems you have to do this while logged in as admin; it never works as a non-privileged user with VMWare Workstation 6.0), and VMware automatically exposes the adapter to the VM, which picks it up as rum0. From there, Kismet works fine.

If you’re a lamer and prefer the dick-holding comfort of Linux, that’s known to work too.

6Jul/072

Building aircrack-ng on OpenBSD 4.1 i386

This is a post to remind myself how to build aircrack-ng on OpenBSD, since I always seem to forget.

The standard OpenBSD make tool doesn’t support $(shell ...) commands, thus the REVISION variable gets set to an empty string instead of 0. As a result, all the code that references the _REVISION macro doesn’t compile right. The failure looks like this:

# make
gcc -O2 -pipe  -D_FILE_OFFSET_BITS=64 -D_REVISION= src/aircrack-ng.c src/crypto.c src/sha1-mmx.S src/common.c src/aircrack-ptw-lib.c -o aircrack-ng -lpthread
src/aircrack-ng.c: In function `main':
src/aircrack-ng.c:2996: error: syntax error before ')' token
*** Error code 1

Stop in /home/anelson/aircrack-ng-0.9.1 (line 27 of Makefile).

The solution is to use GNU make, which under OpenBSD must be installed using the gmake port, and run using gmake instead of make.

When using gmake install to install, I always have to comment out the Makefile line that installs the stuff into SBINDIR, since there are no sbin files; otherwise the install fails thusly:

install -m 755  /sbin
usage: install [-bCcdpSs] [-B suffix] [-f flags] [-g group] [-m mode] [-o owner]
               source [...] target [...]
gmake: *** [install] Error 64
28Jun/072

Tip: Engenius EUB362-EXT doesn't work with OpenBSD 4.1

For reasons I wrote extensively about then lost when I accidentally navigated away from my blog posting form, I’m trying to get a USB wlan adapter going with an OpenBSD VM running kismet. I thought the Engenius EUB-362 EXT with its Atheros USB chipset would be just the ticket; after all, the new [uath(4)](http://www.openbsd.org/cgi-bin/man.cgi?query=uath&sektion=4) driver says it supports such chipsets, and the EUB-362 has an RP-SMA connector and 200mw of transmit power!

Sadly, it doesn’t work. Badly. kismet fails to start with Cannot set ifmedia: Device not configured because the SIOCSIFMEDIA ioctl fails for the device. If I modify pcapsource.cc line 2876 so kismet ignores that failure, my efforts are rewarded with a kernel panic in the uath driver. Upgrading to the -current branch as of 28 June 2007 didn’t help.

I should’ve headed the admonitions about the uath driver being a work in progress, but I really wanted that 200mw transmit power. Stupid me.

I’ve now pinned my hopes on the Alfa AWUS036S, which is based on the more stable Ralink rt2500 chipset. I run three rt2500-based PCI cards in an OpenBSD box at home, and while it does panic from time to time, it’s usually good for a day or so at least. What’s more, I have proof that the AWUS036S works with aircrack-ng on Linux, so even if the ural OpenBSD driver is unstable, I can switch to Ubuntu.

Sure, I could post a bug to the bugs list and go back and forth, but it’s not worth it to me.

27Jan/070

Nasty IRQ conflict under OpenBSD

I’m trying to get three Edimax EW-7128g PCI wlan cards going under OpenBSD, since they work very poorly under Linux. OBSD detects them right away, but I’ve a problem: when I run kismet, my machine panics and has to be rebooted. A look at dmesg output offers a hint as to the problem:

uhci0 at pci0 dev 7 function 2 "Intel 82371AB USB" rev 0x01: apic 2 int 19 (irq 14)
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
piixpm0 at pci0 dev 7 function 3 "Intel 82371AB Power" rev 0x02: SMBus disabled
em0 at pci0 dev 13 function 0 "Intel PRO/1000MT (82540EM)" rev 0x02: apic 2 int 17 (irq 11), address 00:07:e9:0f:4f:09
ral0 at pci0 dev 16 function 0 "Ralink RT2561S" rev 0x00: apic 2 int 19 (irq 14), address 00:0e:2e:b2:f1:23
ral0: MAC/BBP RT2561C, RF RT2527
ral1 at pci0 dev 17 function 0 "Ralink RT2561S" rev 0x00: apic 2 int 20 (irq 10), address 00:0e:2e:b2:f1:28
ral1: MAC/BBP RT2561C, RF RT2527

Notice anything? What IRQ is ral0 on? What IRQ is uhci0 on? Yeah. IRQ 14, both.

I learned about the OBSD vmstat command today. It says:

# vmstat -zi
interrupt                       total     rate
irq0/clock                      41574      200
irq0/ipi                        46323      223
irq66/ahc0                      25815      124
irq65/pciide0                       0        0
irq67/uhci0                         0        0
irq80/em0                         504        2
irq67/ral0                          0        0
irq81/ral1                          0        0
irq112/pckbc0                       0        0
irq176/pccom0                       0        0
irq64/fdc0                          0        0
Total                          114216      551

Confirming my suspicions. I thought IRQs were assigned based on PCI slot, so I don’t see how there could be a conflict. I’ll shuffle the cards around in the hopes of getting a different assignment.

Hmm, I yanked one of the cards, and now vmstat doesn’t show any overlapping IRQs, and I’m still getting a panic when starting kismet. The error is page fault trap, code = 0 at rt2661_set_chan

Hmm, apparently OpenBSD kernel bug 5313 describes this exact problem. It looks like the patch has been committed. Obviously I want this fix, but it seems to be only in 4.0-CURRENT, which is the development branch. The errata don’t mention this bug, so it’s not in 4.0-STABLE. Shit.

Perhaps there’s a workaround. The bug suggests there’s a problem setting the channel of the device after placing it in monitor mode, if it’s not ‘running’ yet. The patch simply doesn’t attempt to set the channel unless the device is in the running state. Can I make the device be in the running state manually? Perhaps with ifconfig up?

Yeah, that took care of it. ifconfig up before starting kismet was all it took.

I’ll put back in the other two cards (which will reintroduce the IRQ conflict) and see if there are any other problems.

I’ve reproduced the IRQ conflict, but both cards seem to work fine with kismet, provided I remember to do an ifconfig up on each of them.

I can’t get the third card to be detected. I’m beginning to suspect the card is bad; no matter which slot I put it in, it doesn’t get detected by OBSD. It was working earlier today; I can’t imagine what happened.

I figured it out (sort of). Since my machine has dual PII 400MHz processors, I switched to the bsd.mp kernel. I switched back to the uniproc kernel, and all three cards are recognized. I don’t know why this would make a difference, but one reason that jumps out at me immediately is that the bsd.mp kernel uses APCI, while the plain bsd kernel does not.

While they are all three detected now, the IRQ conflict is back, this time with ral1 and uhci0. This is preventing my from bringing up ral1. Time for more card juggling.

That didn’t help. I disabled the FDD controller, PS/2 mouse, both serial ports, and the parallel port, and now we’re all green. Too easy.

However, I keep forgetting to bring up the ral* interfaces before running kismet. I’m putting the following in /etc/rc.local:

ifconfig ral0 up
ifconfig ral1 up
ifconfig ral2 up

Never again will I panic my system.

27Jan/070

Making an OpenBSD 4.0 install cd

As a part of the ongoing wireless saga, I’ve grown weary of the difficultly of running the early-beta rt2x00 drivers, so I thought I’d chance OpenBSD 4.0 on hera. Though OpenBSD 4 doesn’t support WPA (a huge shortcoming IMHO) it does have famously good wireless support. Since Ubuntu is basically useless on hera, I wanted to repave her with OpenBSD 4.

Problem: hera has no LAN connectivity. She was made (c. 1997) before on-board Ethernet was invented, and the aging Intel cardbus adapter I have is very flaky.

Solution: build a complete OpenBSD install CD. I followed the instructions in this article, though I had to use my FreeBSD box to build the ISO since I didn’t have/want to fuck with getting mkisofs going on Windows. If using FreeBSD, make sure you install the sysutils/cdrtools package.

After that, it was a breeze. OBSD detected it was an install CD, found all the pieces, and installed them fine. Oh, and support for my Ralink card? Automatic.

15Jan/060

I can't build OpenBSD 3.8-STABLE on ender

Lately ender has taken to freezing hard every few days, requiring a reboot of the VM in which it resides. This sucks as ender is my mail server.

I hoped maybe the problem was some issue fixed in the latest -STABLE, so I updated to the latest -STABLE sources and rebuild the kernel and userland. Kernel built fine, but userland failed same as it did last time I tried:

/usr/src/gnu/usr.bin/binutils/gdb/infrun.c: In function `normal_stop':
/usr/src/gnu/usr.bin/binutils/gdb/infrun.c:3046: error: too many arguments to function `observer_notify_normal_stop'

Google doesn’t have anything to offer, nor do the OBSD mailing list archives. WTF is wrong and why am I the only one w/ the problem!?

26Nov/050

Upgrading Ender to OpenBSD 3.8

I just upgraded ender to OpenBSD 3.8 using the same process I used in Upgrading Jane to OpenBSD 3.8. It was uneventful.

The next step, however, will be more complicated than it was on jane, because ender runs all my mail systems, including postfix, spamassassin, etc. I’ll need to update their ports packages accordingly.

First, I need to advance to -Release, again with the same steps I used to upgrade jane to 3.8-Release.

That went well; I was also able to build a -Stable kernel without difficultly. However, I can’t build userland; instead I get this:

/usr/src/gnu/usr.bin/binutils/gdb/infrun.c:3046: error: too many arguments to function `observer_notify_normal_stop'

Hmm. Surprisingly, Google yields nothing useful. It’s quite surprising that gdb should fail to compile; I wonder if the -Release branch is really broken, or if I’m missing something.

Delicious Bookmarks

Recent Posts

Meta

Current Location