<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>apocryph.org &#187; libcurl</title>
	<atom:link href="http://apocryph.org/tag/libcurl/feed/" rel="self" type="application/rss+xml" />
	<link>http://apocryph.org</link>
	<description>Notes to my future self</description>
	<lastBuildDate>Mon, 21 Jun 2010 15:09:58 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A more in-depth analysis of Ruby HTTP client performance</title>
		<link>http://apocryph.org/2008/11/09/more_indepth_analysis_ruby_http_client_performance/</link>
		<comments>http://apocryph.org/2008/11/09/more_indepth_analysis_ruby_http_client_performance/#comments</comments>
		<pubDate>Sun, 09 Nov 2008 21:25:00 +0000</pubDate>
		<dc:creator>anelson</dc:creator>
				<category><![CDATA[Migrated from Drupal]]></category>
		<category><![CDATA[eventmachine]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[libcurl]]></category>
		<category><![CDATA[net_http]]></category>
		<category><![CDATA[rfuzz]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[tech diary]]></category>

		<guid isPermaLink="false">http://apocryph.org/?p=593</guid>
		<description><![CDATA[As a follow-up to my previous article on Ruby HTTP client performance I&#8217;ve revamed my test rig and revised my tests to cover more variables, more implementations, and more Ruby versions.
This time I compare Ruby 1.8.6, 1.8.7, and 1.9.0, exclusively on CentOS 5 Linux.  I compare the stock Net::HTTP, rfuzz, libcurl, eventmachine, right_http, and [...]]]></description>
			<content:encoded><![CDATA[<p>As a follow-up to my <a href="analysis_ruby_18x_http_client_performance">previous article on Ruby HTTP client performance</a> I&#8217;ve revamed my test rig and revised my tests to cover more variables, more implementations, and more Ruby versions.</p>
<p>This time I compare Ruby 1.8.6, 1.8.7, and 1.9.0, exclusively on CentOS 5 Linux.  I compare the stock <code>Net::HTTP</code>, <code>rfuzz</code>, <code>libcurl</code>, <code>eventmachine</code>, <code>right_http</code>, and a number of <code>Net::HTTP</code> variations with slight performance tweaks.  I evaluate clock time, CPU time, and CPU time over clock time, for five sites with varying network characteristics.  As before, my test rig and results are available on my SVN repository for further experimentation.</p>
<p>For the conclusion and pretty pictures skip to the bottom.  For the gory details read on.</p>
<h2>Test Methodology</h2>
<p>I have a simple HTTP client task, downloading a 10MB zip file from each of five data centers around the world (Seattle, Dallas, Chicago, Washington DC, and London).  I&#8217;ve implemented that task using each of the HTTP client implementations I&#8217;m testing.  I then use each implementation to download the file from each of the data centers using my CentOS 5 VPS box located in Future Hosting&#8217;s Dallas data center.  I keep track of how much wall clock and CPU time is consumed by each implementation with the help of the Ruby <code>benchmark</code> library, and log that information to a CSV file.</p>
<p>I repeat this for ruby 1.8.6, 1.8.7, and 1.9.0.  I then aggregate the results and generate pretty graphs.</p>
<h2>Test implementations</h2>
<p>I have tested the following HTTP client implementations:</p>
<ul>
<li><code>stock_net_http</code> &#8211; The <code>Net::HTTP</code> library that ships with Ruby.  In 1.8.7 and beyond this library is a bit improved by a larger 16K read buffer size, but is otherwise unchanged between revisions</li>
<li><code>net_http_notimeout</code> &#8211; A subclass of <code>Net::HTTP</code> that overrides the timeout logic to eliminate the timeout feature, which is cause for some shitty performance.  This implementation also forces a 16K buffer size even under Ruby 1.8.6</li>
<li><code>net_http_select</code> &#8211; A subclass of <code>Net::HTTP</code> that uses <code>select()</code> to implement timeouts instead of the rather inefficient stock timeout implementation.  This, like all my custom HTTP impls, forces a 16K buffer size.</li>
<li><code>net_http_zerocopy</code> &#8211; Another <code>Net::HTTP</code> subclass that has a modified read loop which uses the same pre-allocated String buffer for each read.  This implementation also uses <code>select()</code> for timeout, and a hard-coded 16K buffer size.</li>
<li><code>net_http_zerocopy_sysread</code> &#8211; A variation of <code>net_http_zerocopy</code> that uses <code>readpartial</code> with no timeout for socket reads, along with the existing preallocated buffer optimizations of its parent.</li>
<li><code>rfuzz</code> &#8211; Uses a slightly modified version of the lightweight HTTP client in the <code>rfuzz</code> library.  The <code>rfuzz</code> base implementation as well as this tweaked one do not implement timeouts.  The <code>rfuzz</code> library is required or this implementation will be skipped.</li>
<li><code>right_http_connection</code> &#8211; Uses the <code>right_http_connection</code> HTTP client implementation from the <code>rightaws</code> library.  Annoyingly, <code>right_http_connection</code> works by monkey-patching <code>Net::HTTP</code>, which is why I had to modify my test rig to run each implementation test in a new instance of the Ruby interpreter.  Bad form.</li>
<li><code>eventmachine</code> &#8211; Uses the EventMachine HTTP client unmodified</li>
<li><code>libcurl</code> Uses the Ruby bindings for the <code>curl</code> native HTTP library</li>
</ul>
<p>Unfortunately, I could not get <code>rev</code> or <code>revactor</code> working on any of my Ruby versions, so I was unable to evaluate those implementations.</p>
<h2>Try this at home</h2>
<p>To reproduce my results, do the following:</p>
<ul>
<li>Check out my SVN repository at <code>http://svn.apocryph.org/svn/projects/rubyhttp/trunk</code> revision 145.</li>
<li>Run the tests with Ruby 1.8.6.  On my machine that&#8217;s the version in the path:
<pre>
  ruby -v
  ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]
</pre>
<p>which means the following command runs all available impls:</p>
<pre>
  ruby -w -rubygems test_all_impls.rb
</pre>
<p>That will run all the available impls (some, like <code>rfuzz</code>, aren&#8217;t available if you haven&#8217;t installed the necessary gem), and log the results to <code>./results/(date)</code>, where <code>(date)</code> is the YYYY-MM-DD date.</li>
<li>Run the tests with Ruby 1.8.7.  On my machine I had to build that from source:
<pre>
   ~/ruby18/bin/ruby -v
   ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
</pre>
<p>To run the tests you have to pass the Ruby command on the command line, since I couldn&#8217;t figure out how to programmatically determine the path to the Ruby interpreter.  On my system that&#8217;s:</p>
<pre>
  ~/ruby18/bin/ruby -w -rubygems test_all_impls.rb "~/ruby18/bin/ruby -w -rubygems"
</pre>
<p>Again, gems are required for some impls.</li>
<li>Run the tests with Ruby 1.9.0.  Same deal as 1.8.7.
<pre>
  ~/ruby19/bin/ruby -v
  ruby 1.9.0 (2008-10-06 revision 19702) [i686-linux]
</pre>
<p>The command is similar:</p>
<pre>
  ~/ruby19/bin/ruby -w -rubygems test_all_impls.rb "~/ruby19/bin/ruby -w -rubygems"
</pre>
</li>
<li>After running all three, you&#8217;ll have a bunch of CSV files in the <code>results</code> subdirectory for today&#8217;s date.  Here&#8217;s what I have:
<pre>
results/2008-11-09/ruby-1.8.6-i686-linux-eventmachine.csv
results/2008-11-09/ruby-1.8.6-i686-linux-libcurl.csv
results/2008-11-09/ruby-1.8.6-i686-linux-net_http_notimeout.csv
results/2008-11-09/ruby-1.8.6-i686-linux-net_http_select.csv
results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy.csv
results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy_sysread.csv
results/2008-11-09/ruby-1.8.6-i686-linux-rfuzz.csv
results/2008-11-09/ruby-1.8.6-i686-linux-right_http_connection.csv
results/2008-11-09/ruby-1.8.6-i686-linux-stock_net_http.csv
results/2008-11-09/ruby-1.8.7-i686-linux-eventmachine.csv
results/2008-11-09/ruby-1.8.7-i686-linux-libcurl.csv
results/2008-11-09/ruby-1.8.7-i686-linux-net_http_notimeout.csv
results/2008-11-09/ruby-1.8.7-i686-linux-net_http_select.csv
results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy.csv
results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy_sysread.csv
results/2008-11-09/ruby-1.8.7-i686-linux-rfuzz.csv
results/2008-11-09/ruby-1.8.7-i686-linux-right_http_connection.csv
results/2008-11-09/ruby-1.8.7-i686-linux-stock_net_http.csv
results/2008-11-09/ruby-1.9.0-i686-linux-eventmachine.csv
results/2008-11-09/ruby-1.9.0-i686-linux-net_http_notimeout.csv
results/2008-11-09/ruby-1.9.0-i686-linux-net_http_select.csv
results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy.csv
results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy_sysread.csv
results/2008-11-09/ruby-1.9.0-i686-linux-right_http_connection.csv
results/2008-11-09/ruby-1.9.0-i686-linux-stock_net_http.csv
</pre>
</li>
<li>To generate aggregate data files suitable for analysis, <code>cd</code> into the results directory and run:
<p>ruby -w -rubygems ../../combine_csv.rb</p>
<p>This will aggregate all the .csv files in the directory and format them into three aggregate files: <code>clock_time.txt</code>, <code>cpu_percentage.txt</code>, and <code>total_cpu_time.txt</code>.  These files are formatted with columns for the server locations, rows for the HTTP implementation names, and values corresponding to the wall clock time, clock time over cpu time, and cpu time for each implementation and site.  These are ready-made for generating the bar charts below in Excel.  Note that you&#8217;ll need the FasterCSV library in order for this to work.</li>
</ul>
<h2>Results</h2>
<p>Running all the tests is a pain in the ass.  If you fetch my SVN repository, you&#8217;ll find the raw data files that I got from my tests under <code>results/2008-11-09</code>.  Or, you can just read my analysis below.</p>
<h3>Clock time</h3>
<p><img src="wp-content/uploads/drupal/clock%20time%20graph.png" alt="Wall clock time" /></p>
<p>As you can see above, each implementation takes more or less the same amount of wall clock time to download from a given site, with significant variations between sites.  This is expected, as downloading a file over the Internet is a mostly network-bound operation.  We don&#8217;t care so much how long it takes, as how much the CPU has to work while it&#8217;s happening.  Which brings us to&#8230;</p>
<h3>CPU Time</h3>
<p><img src="wp-content/uploads/drupal/total%20cpu%20time%20graph.png" alt="CPU time" /></p>
<p>Wow, stock 1.8.6 <code>Net::HTTP</code> really is teh suck!  At least twice as much CPU usage as the nearest competitor.  Going from a 1K read buffer to 16K in 1.8.7 made a big difference.</p>
<p>Going further down the list, you can see the Ruby 1.9.0 <code>Net::HTTP</code> implementations with zero copy reads and <code>readpartial</code>, and the <code>notimeout</code> variant, are the best performers, with <code>rfuzz</code>, <code>libcurl</code>, and <code>eventmachine</code> close behind.  It&#8217;s encouraging that a pure-ruby impl like <code>rfuzz</code> can compete with a mostly native impl like <code>libcurl</code>.</p>
<p>It&#8217;s also important to note that each of the downloads, be it the super-fast Dallas or the slow London, hit the CPU the same way.  This really jumps out when you look at CPU time over wall clock time:</p>
<h3>Percent of wall clock time spent using the CPU</h3>
<p><img src="wp-content/uploads/drupal/cpu%20percentage%20graph.png" alt="CPU time over wall clock time" /></p>
<p>Here you see the various transfer times for each site, but you can also see the widely ranging performance of the various HTTP implementations.  No real surprises here; <code>rfuzz</code>, <code>libcurl</code>, and <code>eventmachine</code> are doing very well, while the 1.8.6 stock <code>Net::HTTP</code> continues to blow.</p>
<h2>Conclusion</h2>
<p>If you need an HTTP client in Ruby, <em>DO NOT</em> use the 1.8.6 <code>Net::HTTP</code>.  The 1.8.7 version is considerably better, but <code>libcurl</code>, <code>rfuzz</code>, or <code>eventmachine</code> are all better still.</p>
<p>Within the <code>Net::HTTP</code> family, dropping the inefficient timeout implementation and optimizing the read code to reuse the same buffer are both pretty low-hanging optimizations which should be considered for a future Ruby release.  For now, I&#8217;d recommend <code>libcurl</code> if you&#8217;re on Linux, or 1.8.7 <code>Net::HTTP</code> on Windows (since <code>rfuzz</code> doesn&#8217;t have timeouts, and <code>eventmachine</code> is hard to get going under Ruby on Windows).</p>
<p><strong>UPDATE</strong>: It turns out there is a binary gem release of eventmachine 0.12.0 for Windows, so if you&#8217;re doing Windows development and need a performant HTTP client implementation, you should definitely look into eventmachine.  Thanks to Abdul-Rahman Advany for the tip.</p>
]]></content:encoded>
			<wfw:commentRss>http://apocryph.org/2008/11/09/more_indepth_analysis_ruby_http_client_performance/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
