Skip navigation.

Syndicate

Syndicate content

User login

Absolutely bullshit Ruby HTTP client situation

For one of my self-edification projects I’ve been trying to implement some very simple HTTP client code in Ruby, which I want to use to transfer hundreds or thousands of megabytes of data from an HTTP server. Since I need to transfer large blocks of data, I need a streaming HTTP API that allows me to read the data in small chunks so as not to exhaust available memory buffering reads. This is a very common idiom, and I was not surprised to find that Ruby’s built-in HTTP implementation, Net::HTTP, had just such a function.

Imagine my surprise, then, when I ran the following code:

#!/usr/bin/ruby -w

require 'net/http'
require 'logger'

BLOCK_SIZE = 1024*128 #128 K
REMOTE_URL = "http://wdc01.futurehosting.biz/test100.zip"
logger = Logger.new(STDOUT)

logger.debug("Parsing URL #{REMOTE_URL}")
url = URI.parse(REMOTE_URL)
logger.debug("Starting HTTP session with host #{url.host}, port #{url.port}")
Net::HTTP.start(url.host, url.port) do |http|
  logger.debug("Sending HTTP request for path #{url.path}")

  http.request_get(url.path) do |response|
    logger.debug("Processing response")
    logger.debug("Response headers: #{response.to_hash.to_s}")

    logger.debug("Rendering response")
    File.open('foo', 'wb') do |file|
      response.read_body do |body|
        logger.debug("Sending back #{body.length} bytes of response data")
        if (file.respond_to?(:syswrite))
          file.syswrite(body)
        else
          file.write(body)
        end
      end
    end
  end
end

What’s supposed to happen here is the 100MB test file is read in manageable chunks, thus memory and CPU very minimally used as the transfer proceeds. What actually happens is the CPU redlines, and the data are transferred in 1024 byte chunks. That’s right, there’s hard coded 1K chunk size in Ruby’s HTTP implementation, which means there’s tons of CPU overhead executing Ruby methods and socket system calls.

Don’t believe me? Think I’m some Ruby n00b, or maybe a Ruby hater? Read this thread. The combined might of the ruby-talk list was not sufficient to solve this guy’s problem. He came up with a hack that monkeyed with the built-in Net::HTTP classes to force a larger buffer size, but that hack broke in Ruby 1.8.5, and there’s no fucking way I’m going to go through that sort of contortion to make a craptastic library usable.

I continue to be amazed at the various areas in which Ruby is immature and vastly inferior to alternatives like Python. Still, every time I swear it off, I end up coming back after being reminded how much I hate using Python, and how refreshing Ruby’s dynamic duck-typing idioms are after a hard day of C# and C++. So, with fuck it no longer an option, I had to find an alternative.

There are other HTTP clients, but they built on top of Net::HTTP, which is not helpful. This led me to rfuzz, which is a fuzz testing tool for web apps. One nice feature of this tool is very light HTTP client library written atop the low-level TCPSocket object, and thus suffering none of Net::HTTP’s dreadful performance. This client was made for a very different purpose, and had no streaming response abilities that I could find, but by subclassing the HttpClient class I was able to adapt it to my needs with minimal effort.

Now, I can stream a large file with a single-digit CPU hit, and I can control the buffer size to my liking.

The longer I’m a software developer, it seems the more I find myself irritated as the kind of low-level wheel-reinventing I’m constantly stuck with, for no better reason than software isn’t as far along as it should be by now. Maybe I was born too early after all.

UPDATE: In a shockingly uncharacteristic attempt to light a candle, I’ve done a detailed analysis of Ruby 1.8 HTTP client performance, complete with pretty pictures.