<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>apocryph.org &#187; named</title>
	<atom:link href="http://apocryph.org/tag/named/feed/" rel="self" type="application/rss+xml" />
	<link>http://apocryph.org</link>
	<description>Notes to my future self</description>
	<lastBuildDate>Mon, 09 Aug 2010 16:59:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>named on OpenBSD sometimes logs &#8216;error sending response: not enough free resources&#8217; under load</title>
		<link>http://apocryph.org/2009/02/22/named-on-openbsd-sometimes-logs-error-sending-response-not-enough-free-resources-under-load/</link>
		<comments>http://apocryph.org/2009/02/22/named-on-openbsd-sometimes-logs-error-sending-response-not-enough-free-resources-under-load/#comments</comments>
		<pubDate>Sun, 22 Feb 2009 18:29:51 +0000</pubDate>
		<dc:creator>anelson</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ENOBUFS]]></category>
		<category><![CDATA[error]]></category>
		<category><![CDATA[named]]></category>
		<category><![CDATA[openbsd]]></category>

		<guid isPermaLink="false">http://apocryph.org/?p=659</guid>
		<description><![CDATA[I recently repaved my OpenBSD firewall/router to upgrade to OpenBSD 4.4 and more importantly to load the OS and config files onto a CompactFlash drive, after I started noticing the telltale &#8216;clunk&#8217; sound coming from its hard drive.  Not wanting to lose Internet access at an inopportune time, I switched to 4GB of cheap, solid-state [...]]]></description>
			<content:encoded><![CDATA[<p>I recently repaved my OpenBSD firewall/router to upgrade to OpenBSD 4.4 and more importantly to load the OS and config files onto a CompactFlash drive, after I started noticing the telltale &#8216;clunk&#8217; sound coming from its hard drive.  Not wanting to lose Internet access at an inopportune time, I switched to 4GB of cheap, solid-state storage.</p>
<p>However, during the reconfiguration I started to get alot of messages like this, particularly during heavy network loads:</p>
<p><code>Feb 21 23:54:33 boromir named[11546]: client 192.168.1.127#50805: error sending response: not enough free resources<br />
</code></p>
<p>I googled around, and noticed a number of people reporting this problem with named, on OpenBSD, FreeBSD, and some Linux flavors.  For me, I can make it happen by downloading a well-seeded BitTorrent and thereby saturating my network pipe.  Others also reported the issue being correlated with heavy network loads of one sort or another.</p>
<p>The usual suspects have already been eliminated.  Here&#8217;s what <code>top</code> says:</p>
<p><code>load averages:  0.08,  0.08,  0.08<br />
28 processes:  27 idle, 1 on processor<br />
CPU states:  0.2% user,  0.0% nice,  0.2% system, 22.9% interrupt, 76.8% idle<br />
Memory: Real: 26M/55M act/tot  Free: 95M  Swap: 0K/516M used/tot</code></p>
<p>As you can see, it&#8217;s not simply a problem of low memory.  I&#8217;ve got plenty of physical free, and haven&#8217;t even touched swap.</p>
<p>So, maybe mbufs, right?  No:</p>
<p><code># netstat -m<br />
105 mbufs in use:<br />
97 mbufs allocated to data<br />
2 mbufs allocated to packet headers<br />
6 mbufs allocated to socket names and addresses<br />
96/376/6144 mbuf clusters in use (current/peak/max)<br />
852 Kbytes allocated to network (25% in use)<br />
0 requests for memory denied<br />
0 requests for memory delayed<br />
0 calls to protocol drain routines<br />
</code></p>
<p>I also tried mucking about with some network-related sysctls.  I found a list <a href="http://nsmwiki.org/OpenBSD_Performance">here</a> that I tried (only the &#8216;net&#8217; stuff), to no avail.</p>
<p>Then I pulled a copy of the source code for OpenBSD&#8217;s named implementation.  If you&#8217;re interested, its on any OpenBSD AnonCVS mirror under <code>src/usr.sbin/bind</code>.  The WebCVS interface is <a href="http://www.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/bind/">here</a>.  Under <code>bin/named</code> in <code><a href="http://www.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/bind/bin/named/client.c?rev=1.10;content-type=text%2Fplain">client.c</a></code>, is this:</p>
<pre>static void
client_senddone(isc_task_t *task, isc_event_t *event) {
	ns_client_t *client;
	isc_socketevent_t *sevent = (isc_socketevent_t *) event;

	REQUIRE(sevent != NULL);
	REQUIRE(sevent-&gt;ev_type == ISC_SOCKEVENT_SENDDONE);
	client = sevent-&gt;ev_arg;
	REQUIRE(NS_CLIENT_VALID(client));
	REQUIRE(task == client-&gt;task);
	REQUIRE(sevent == client-&gt;sendevent);

	UNUSED(task);

	CTRACE("senddone");

	if (sevent-&gt;result != ISC_R_SUCCESS)
		ns_client_log(client, NS_LOGCATEGORY_CLIENT,
			      NS_LOGMODULE_CLIENT, ISC_LOG_WARNING,
			      <strong>"error sending response: %s",</strong>
			      isc_result_totext(sevent-&gt;result));

	INSIST(client-&gt;nsends &gt; 0);
	client-&gt;nsends--;

	if (client-&gt;tcpbuf != NULL) {
		INSIST(TCP_CLIENT(client));
		isc_mem_put(client-&gt;mctx, client-&gt;tcpbuf, TCP_BUFFER_SIZE);
		client-&gt;tcpbuf = NULL;
	}

	if (exit_check(client))
		return;

	ns_client_next(client, ISC_R_SUCCESS);
}</pre>
<p>Note the part in bold.  So this is where the &#8220;error sending response&#8221; bit comes from.  I&#8217;m no expert in BIND code, but I&#8217;ve done a good bit of socket programming, and this routine appears to handle the asynchronous (or, in socket terms, &#8216;non-blocking&#8217;) completion of a <code>send</code> call, writing a response back to the DNS client.  The send has failed, so it&#8217;s writing out this message to the log.  However, what of the other part?  What of <code>not enough free resources</code>?</p>
<p>Well, notice what is provided for the <code>%s</code> placeholder: the results of <code>isc_result_totext(sevent-&gt;result)</code>.  So <code>isc_result_totext</code> is getting some sort of error code and converting it into the &#8220;not enough free resources&#8221; message.  But what code?</p>
<p>I then <code>grep</code>ed the whole <code>bind</code> tree for the text &#8220;not enough free resources&#8221;.  I found this line in <code>lib/isc/include/isc/result.h</code>:</p>
<p><code>#define ISC_R_NORESOURCES          13      /*%&lt; not enough free resources */</code></p>
<p>There&#8217;s also a corresponding <code>result.c</code> that implements the <code>isc_result_totext</code> function.  So, what causes the <code>ISC_R_NORESOURCES</code> error?</p>
<p>I did some more <code>grep</code> work for that error code, and found lots of instances, mostly in <code><a href="http://www.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/bind/lib/isc/unix/socket.c?rev=1.14;content-type=text%2Fplain">lib/isc/unix/socket.c</a></code>.  Upon reviewing all the instances, it appears that error is almost always a result of a <code>ENOBUFS</code> errno from a socket operation.</p>
<p>So off we go to the <code><a href="http://www.openbsd.org/cgi-bin/man.cgi?query=send&amp;apropos=0&amp;sektion=2&amp;manpath=OpenBSD+4.4&amp;arch=i386&amp;format=html">send</a></code> manpage.  According to that, a return value of <code>ENOBUFS</code> denotes one of two things:</p>
<ul>
<li>&#8220;The system was unable to allocate an internal buffer.  The operation may succeed when buffers become available.&#8221;</li>
<li>&#8220;The output queue for a network interface was full.  This generally indicates that the interface has stopped sending, but may be caused by transient congestion.&#8221;</li>
</ul>
<p>If &#8216;internal buffer&#8217; means &#8216;mbuf&#8217;, then I doubt that&#8217;s the problem, as I&#8217;ve got plenty of room there.  It was the output queue that struck me.  This is happening during heavy load, when the internal network interface would be getting alot of traffic.  But what determines the size of its output queue, and how do you grow it?</p>
<p>I rummaged around alot on this, and I could not find an answer.  I looked for driver configuration options for the <code>fxp</code> driver, and found nothing.  So then I started poking around the <a href="http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/ic/fxp.c?rev=1.94;content-type=text%2Fplain">source code for the fxp driver</a>, and found this:</p>
<p><code>IFQ_SET_MAXLEN(&amp;ifp-&gt;if_snd, FXP_NTXCB - 1);</code></p>
<p><code>FXP_NTXCB</code> is defined in the header file, and is hard-coded to 128:</p>
<pre>/*
 * Number of transmit control blocks. This determines the number
 * of transmit buffers that can be chained in the CB list.
 * This must be a power of two.
 */
#define FXP_NTXCB	128</pre>
<p>It appears from this reading that the interface&#8217;s send queue is hard-coded.  In order to lift this limit I would either have to do a custom kernel build, or find a network adapter with a larger and/or configurable send queue.  That just doesn&#8217;t make sense, as OpenBSD isn&#8217;t supposed to be that lame.  It&#8217;s entirely possible I&#8217;m misunderstanding the cause of the problem, especially since users have reported this under FreeBSD and Linux as well, but damned if I know what to do about it.</p>
<p>Ultimately this isn&#8217;t a huge issue.  Apart from the aversion I have to a bunch of errors in my syslog, UDP in general and DNS in particular are designed to handle dropped responses by retransmitting the requests, but it does result in a perceptible lag during DNS resolution which I&#8217;d really like to fix.</p>
]]></content:encoded>
			<wfw:commentRss>http://apocryph.org/2009/02/22/named-on-openbsd-sometimes-logs-error-sending-response-not-enough-free-resources-under-load/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
