A more in-depth analysis of Ruby HTTP client performance
As a follow-up to my previous article on Ruby HTTP client performance I’ve revamed my test rig and revised my tests to cover more variables, more implementations, and more Ruby versions.
This time I compare Ruby 1.8.6, 1.8.7, and 1.9.0, exclusively on CentOS 5 Linux. I compare the stock Net::HTTP, rfuzz, libcurl, eventmachine, right_http, and a number of Net::HTTP variations with slight performance tweaks. I evaluate clock time, CPU time, and CPU time over clock time, for five sites with varying network characteristics. As before, my test rig and results are available on my SVN repository for further experimentation.
For the conclusion and pretty pictures skip to the bottom. For the gory details read on.
Test Methodology
I have a simple HTTP client task, downloading a 10MB zip file from each of five data centers around the world (Seattle, Dallas, Chicago, Washington DC, and London). I’ve implemented that task using each of the HTTP client implementations I’m testing. I then use each implementation to download the file from each of the data centers using my CentOS 5 VPS box located in Future Hosting’s Dallas data center. I keep track of how much wall clock and CPU time is consumed by each implementation with the help of the Ruby benchmark library, and log that information to a CSV file.
I repeat this for ruby 1.8.6, 1.8.7, and 1.9.0. I then aggregate the results and generate pretty graphs.
Test implementations
I have tested the following HTTP client implementations:
stock_net_http– TheNet::HTTPlibrary that ships with Ruby. In 1.8.7 and beyond this library is a bit improved by a larger 16K read buffer size, but is otherwise unchanged between revisionsnet_http_notimeout– A subclass ofNet::HTTPthat overrides the timeout logic to eliminate the timeout feature, which is cause for some shitty performance. This implementation also forces a 16K buffer size even under Ruby 1.8.6net_http_select– A subclass ofNet::HTTPthat usesselect()to implement timeouts instead of the rather inefficient stock timeout implementation. This, like all my custom HTTP impls, forces a 16K buffer size.net_http_zerocopy– AnotherNet::HTTPsubclass that has a modified read loop which uses the same pre-allocated String buffer for each read. This implementation also usesselect()for timeout, and a hard-coded 16K buffer size.net_http_zerocopy_sysread– A variation ofnet_http_zerocopythat usesreadpartialwith no timeout for socket reads, along with the existing preallocated buffer optimizations of its parent.rfuzz– Uses a slightly modified version of the lightweight HTTP client in therfuzzlibrary. Therfuzzbase implementation as well as this tweaked one do not implement timeouts. Therfuzzlibrary is required or this implementation will be skipped.right_http_connection– Uses theright_http_connectionHTTP client implementation from therightawslibrary. Annoyingly,right_http_connectionworks by monkey-patchingNet::HTTP, which is why I had to modify my test rig to run each implementation test in a new instance of the Ruby interpreter. Bad form.eventmachine– Uses the EventMachine HTTP client unmodifiedlibcurlUses the Ruby bindings for thecurlnative HTTP library
Unfortunately, I could not get rev or revactor working on any of my Ruby versions, so I was unable to evaluate those implementations.
Try this at home
To reproduce my results, do the following:
- Check out my SVN repository at
http://svn.apocryph.org/svn/projects/rubyhttp/trunkrevision 145. - Run the tests with Ruby 1.8.6. On my machine that’s the version in the path:
ruby -v ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]
which means the following command runs all available impls:
ruby -w -rubygems test_all_impls.rb
That will run all the available impls (some, like
rfuzz, aren’t available if you haven’t installed the necessary gem), and log the results to./results/(date), where(date)is the YYYY-MM-DD date. - Run the tests with Ruby 1.8.7. On my machine I had to build that from source:
~/ruby18/bin/ruby -v ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
To run the tests you have to pass the Ruby command on the command line, since I couldn’t figure out how to programmatically determine the path to the Ruby interpreter. On my system that’s:
~/ruby18/bin/ruby -w -rubygems test_all_impls.rb "~/ruby18/bin/ruby -w -rubygems"
Again, gems are required for some impls.
- Run the tests with Ruby 1.9.0. Same deal as 1.8.7.
~/ruby19/bin/ruby -v ruby 1.9.0 (2008-10-06 revision 19702) [i686-linux]
The command is similar:
~/ruby19/bin/ruby -w -rubygems test_all_impls.rb "~/ruby19/bin/ruby -w -rubygems"
- After running all three, you’ll have a bunch of CSV files in the
resultssubdirectory for today’s date. Here’s what I have:results/2008-11-09/ruby-1.8.6-i686-linux-eventmachine.csv results/2008-11-09/ruby-1.8.6-i686-linux-libcurl.csv results/2008-11-09/ruby-1.8.6-i686-linux-net_http_notimeout.csv results/2008-11-09/ruby-1.8.6-i686-linux-net_http_select.csv results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy.csv results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy_sysread.csv results/2008-11-09/ruby-1.8.6-i686-linux-rfuzz.csv results/2008-11-09/ruby-1.8.6-i686-linux-right_http_connection.csv results/2008-11-09/ruby-1.8.6-i686-linux-stock_net_http.csv results/2008-11-09/ruby-1.8.7-i686-linux-eventmachine.csv results/2008-11-09/ruby-1.8.7-i686-linux-libcurl.csv results/2008-11-09/ruby-1.8.7-i686-linux-net_http_notimeout.csv results/2008-11-09/ruby-1.8.7-i686-linux-net_http_select.csv results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy.csv results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy_sysread.csv results/2008-11-09/ruby-1.8.7-i686-linux-rfuzz.csv results/2008-11-09/ruby-1.8.7-i686-linux-right_http_connection.csv results/2008-11-09/ruby-1.8.7-i686-linux-stock_net_http.csv results/2008-11-09/ruby-1.9.0-i686-linux-eventmachine.csv results/2008-11-09/ruby-1.9.0-i686-linux-net_http_notimeout.csv results/2008-11-09/ruby-1.9.0-i686-linux-net_http_select.csv results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy.csv results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy_sysread.csv results/2008-11-09/ruby-1.9.0-i686-linux-right_http_connection.csv results/2008-11-09/ruby-1.9.0-i686-linux-stock_net_http.csv
- To generate aggregate data files suitable for analysis,
cdinto the results directory and run:ruby -w -rubygems ../../combine_csv.rb
This will aggregate all the .csv files in the directory and format them into three aggregate files:
clock_time.txt,cpu_percentage.txt, andtotal_cpu_time.txt. These files are formatted with columns for the server locations, rows for the HTTP implementation names, and values corresponding to the wall clock time, clock time over cpu time, and cpu time for each implementation and site. These are ready-made for generating the bar charts below in Excel. Note that you’ll need the FasterCSV library in order for this to work.
Results
Running all the tests is a pain in the ass. If you fetch my SVN repository, you’ll find the raw data files that I got from my tests under results/2008-11-09. Or, you can just read my analysis below.
Clock time

As you can see above, each implementation takes more or less the same amount of wall clock time to download from a given site, with significant variations between sites. This is expected, as downloading a file over the Internet is a mostly network-bound operation. We don’t care so much how long it takes, as how much the CPU has to work while it’s happening. Which brings us to…
CPU Time

Wow, stock 1.8.6 Net::HTTP really is teh suck! At least twice as much CPU usage as the nearest competitor. Going from a 1K read buffer to 16K in 1.8.7 made a big difference.
Going further down the list, you can see the Ruby 1.9.0 Net::HTTP implementations with zero copy reads and readpartial, and the notimeout variant, are the best performers, with rfuzz, libcurl, and eventmachine close behind. It’s encouraging that a pure-ruby impl like rfuzz can compete with a mostly native impl like libcurl.
It’s also important to note that each of the downloads, be it the super-fast Dallas or the slow London, hit the CPU the same way. This really jumps out when you look at CPU time over wall clock time:
Percent of wall clock time spent using the CPU

Here you see the various transfer times for each site, but you can also see the widely ranging performance of the various HTTP implementations. No real surprises here; rfuzz, libcurl, and eventmachine are doing very well, while the 1.8.6 stock Net::HTTP continues to blow.
Conclusion
If you need an HTTP client in Ruby, DO NOT use the 1.8.6 Net::HTTP. The 1.8.7 version is considerably better, but libcurl, rfuzz, or eventmachine are all better still.
Within the Net::HTTP family, dropping the inefficient timeout implementation and optimizing the read code to reuse the same buffer are both pretty low-hanging optimizations which should be considered for a future Ruby release. For now, I’d recommend libcurl if you’re on Linux, or 1.8.7 Net::HTTP on Windows (since rfuzz doesn’t have timeouts, and eventmachine is hard to get going under Ruby on Windows).
UPDATE: It turns out there is a binary gem release of eventmachine 0.12.0 for Windows, so if you’re doing Windows development and need a performant HTTP client implementation, you should definitely look into eventmachine. Thanks to Abdul-Rahman Advany for the tip.