Skip navigation.

Syndicate

Syndicate content

User login

An analysis of Ruby 1.8.x HTTP client performance

Not too long ago I bitched about the performance of Ruby’s HTTP client. Some of the comments to that post prompted me to investigate this further, in the hopes of finding a more performant implementation solution.

The results of my analysis are in, and they’re…interesting, to say the least.

Summary

Ruby 1.8.6 (which still seems the dominant version among both Linux binary packages and the Windows One-Click Installer) uses a hard-coded 1K buffer size for HTTP reads, which leads to a ton of CPU usage during large HTTP downloads, even though the operation should be I/O bound and barely touch the CPU.

Ruby 1.8.7 includes a change described by the following entry in the changelog:

Mon Mar 19 11:39:29 2007  Minero Aoki  <aamine@loveruby.net>

    * lib/net/protocol.rb (rbuf_read): extend buffer size for speed.

After this change, Ruby’s HTTP implementation now uses a hard-coded 16K buffer, in the hopes of improving performance. Whether or not this actually improves things will become clear in my analysis later on.

In addition to Ruby’s built-in Net::HTTP client, I evaluated two alternatives: a version of the rfuzz HTTP client modified to support streaming GETs, and curb, the Ruby bindings for the native libcurl HTTP client library. My goal was to determine the best-case Ruby HTTP client performance as indicated by the performance of these two implementations, then munge Ruby’s stock implementation to try to approach the best-case performance.

rubyhttp

I wrote a tool, rubyhttp, to help me perform these tests. The code is freely available at my SVN repository at http://svn.apocryph.org/svn/projects/rubyhttp/trunk. To grab the code, do a svn co http://svn.apocryph.org/svn/projects/rubyhttp/trunk. The tests below were run with revision 127 of the code.

Test environment

I ran the tests on two machines: wyoh, a Windows XP x64-edition Core 2 Duo laptop with a FiOS internet connection, and lio, one of my FutureHosting VPS boxes running CentOS 5.

On wyoh I used the version of Ruby that comes with the latest one-click installer:

>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

On lio I tested two versions of Ruby. The first was installed by the ruby yum package:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]

As you can see, this is the same version and patchlevel as my Windows box. Once I discovered the 16k buffer enhancement in Ruby 1.8.7, I downloaded and built the latest 1.8.7 source tree. This is:

$ ~/ruby18/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]

Test data

My test code fetches the 10MB test files published by FutureHosting for measurement of the network performance at each of their data centers. Thus, my code retrieves data from Seattle, Dallas, Chicago, Washington DC, and London. lio is located in the very same Dallas data center, hence the crazy-high download speeds there, while wyoh is located in the suburbs of Washington DC in close geographical and network proximity to the DC datacenter.

HTTP variations

Each test run does an HTTP get from five different locations, using Net::HTTP and (on Linux only) rfuzz and curb as well. Neither rfuzz nor curb could be made to work on Windows, so the Windows runs use only Net::HTTP.

Most of the tests exercise some variation in the Net::HTTP implementation. The following variations are used:

  • stock - As it implies, the Net::HTTP implementation is unmodified from whatever ships with the version of Ruby being used
  • custom-16kbuffer - Modifies the buffer size from 1K to 16K. Note that Ruby 1.8.7 already includes this modification, so you’ll only see this run with Ruby 1.8.6 on Windows.
  • custom-16kbuffer-notimeout - Buffer size of 16K, and the timeout call is removed. This obviously isn’t a practical change, but it demonstrates the overhead of Ruby’s appalling timeout implementation
  • custom-16kbuffer-select - Buffer size of 16K, and the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka Akira on the ruby-talk list
  • custom-16kbuffer-selectwithsysread - Buffer size of 16K, the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka Akira on the ruby-talk list, and the read_nonblocking call after select indicates the presence of data to read is replaced by sysread
  • custom-64kbuffer-notimeout - Buffer size of 64K, and the timeout call is removed. This obviously isn’t a practical change, but it demonstrates the overhead of Ruby’s appalling timeout implementation
  • custom-64kbuffer-select - Buffer size of 64K, and the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka

All of these custom variations modify lib/ruby-1.8/net/protocol.rb, which contains the socket I/O functionality used by Net::HTTP. The rbuf_full method contains the actual socket read logic.

Data

Each run outputs the following information for each combination of HTTP URL and HTTP client implementation:

  • Site - name of the site (eg ‘seattle’, ‘washdc’, etc)
  • Impl - name of the HTTP implementation
  • KBytes Transferred
  • KBytes/second
  • Chunk count - The number of reads required to fetch the entire file
  • Mean chunk size - The average read size
  • Min chunk size
  • Max chunk size
  • User Time - The % of user time taken by ruby during the run
  • System Time - The % of system time taken by ruby during the run
  • Total CPU Time - The total % of CPU time taken by ruby during the run
  • Clock time - The amount of time spent downloading the file
  • % CPU usage - Defined as Total CPU Time / Clock Time * 100. The percentage of available CPU time taken by ruby

The raw data are available under SVN at results/linux/2008-10-4 and results/windows/2008-10-4. I used my combine_csv.rb tool to generate results/linux/2008-10-4/aggregate.csv from the individual test results. I ended up not using the Windows results as they complicated the graph and didn’t materially impact the conclusion.

I sucked aggregate.csv into Excel to do some munging.

Pretty Pictures

I uploaded the data to Swivel, thinking it would make it easy to analyze the data. It didn’t. I wanted to do a clustered bar graph, where each cluster corresponds to a site, and bars within that cluster reflect CPU usage for each implementation when downloading from that site. Swivel is way too limited for that.

The best I can so is this graph, which clusters by implementation and graphs CPU usage for each site; the opposite of what I wanted. You can play with the data yourself if you like.

Using good old fashion Excel, I generated this graph:

Ruby HTTP implementations performance

As you can see, the worst performers are the stock Net::HTTP implementations in both 1.8.6 and 1.8.7, though 1.8.6 is noticeably worse due to the 1K buffer size vs 16K for 1.8.7. The best performer is curb (libcurl bindings for Ruby), under with 1.8.6 and 1.8.7. The fastest Net::HTTP-based implementation uses a 16K buffer size and bypasses the timeout method, which is apparently quite inefficient. Using the non-blocking select to implement a timeout is slower than no timeout at all, but still considerably better than the stock impl. Finally, the 64k buffer size variants were actually worse performance-size than the 16K variants.

It’s also quite obvious that Dallas transfers took up the most CPU, while London took the least. What you can’t see from this graph, but would see in the raw data, is that Dallas transfers were crazy-fast (since these tests were run on the same network as the Dallas test file), so there was less wall-clock time spent on the test, thus the transfer was less I/O bound than others. For the same reason, London, by far the slowest transfer, uses the least amount of CPU. This does not mean that transfers from fast download sites are inherently less efficient. If instead of %CPU time I used the total CPU time column, this disparity would vanish.

Conclusion

Ruby’s Net::HTTP implementation blows. It’s a bit better in 1.8.7 with the new 16K buffer size, but the timeout implementation has got to go. Even with timeout eliminated, Net::HTTP is trounced by the pure-Ruby rfuzz and the native/Ruby blend curb, suggesting that timeout notwithstanding, there are other inefficiencies in Net::HTTP. Looking at the protocol.rb code, I’m struck by how painfully inefficient the implementation is with buffers. rfuzz and curb minimize buffer copies and my rfuzz streaming HTTP extension reuses the same buffer for multiple calls, while Net::HTTP is happily appending and sliceing away at arrays.

I think architecturally Net::HTTP can be saved, but it needs rewritten buffered I/O and an alternative to timeout, preferably in the form of select.

I’m going to try to work on the necessary changes, and will post whatever I come up with.