apocryph.org Notes to my future self

27Sep/0819

Absolutely bullshit Ruby HTTP client situation

For one of my self-edification projects I’ve been trying to implement some very simple HTTP client code in Ruby, which I want to use to transfer hundreds or thousands of megabytes of data from an HTTP server. Since I need to transfer large blocks of data, I need a streaming HTTP API that allows me to read the data in small chunks so as not to exhaust available memory buffering reads. This is a very common idiom, and I was not surprised to find that Ruby’s built-in HTTP implementation, Net::HTTP, had just such a function.

Imagine my surprise, then, when I ran the following code:

#!/usr/bin/ruby -w

require 'net/http'
require 'logger'

BLOCK_SIZE = 1024*128 #128 K
REMOTE_URL = "http://wdc01.futurehosting.biz/test100.zip"
logger = Logger.new(STDOUT)

logger.debug("Parsing URL #{REMOTE_URL}")
url = URI.parse(REMOTE_URL)
logger.debug("Starting HTTP session with host #{url.host}, port #{url.port}")
Net::HTTP.start(url.host, url.port) do |http|
  logger.debug("Sending HTTP request for path #{url.path}")

  http.request_get(url.path) do |response|
    logger.debug("Processing response")
    logger.debug("Response headers: #{response.to_hash.to_s}")

    logger.debug("Rendering response")
    File.open('foo', 'wb') do |file|
      response.read_body do |body|
        logger.debug("Sending back #{body.length} bytes of response data")
        if (file.respond_to?(:syswrite))
          file.syswrite(body)
        else
          file.write(body)
        end
      end
    end
  end
end

What’s supposed to happen here is the 100MB test file is read in manageable chunks, thus memory and CPU very minimally used as the transfer proceeds. What actually happens is the CPU redlines, and the data are transferred in 1024 byte chunks. That’s right, there’s hard coded 1K chunk size in Ruby’s HTTP implementation, which means there’s tons of CPU overhead executing Ruby methods and socket system calls.

Don’t believe me? Think I’m some Ruby n00b, or maybe a Ruby hater? Read this thread. The combined might of the ruby-talk list was not sufficient to solve this guy’s problem. He came up with a hack that monkeyed with the built-in Net::HTTP classes to force a larger buffer size, but that hack broke in Ruby 1.8.5, and there’s no fucking way I’m going to go through that sort of contortion to make a craptastic library usable.

I continue to be amazed at the various areas in which Ruby is immature and vastly inferior to alternatives like Python. Still, every time I swear it off, I end up coming back after being reminded how much I hate using Python, and how refreshing Ruby’s dynamic duck-typing idioms are after a hard day of C# and C++. So, with fuck it no longer an option, I had to find an alternative.

There are other HTTP clients, but they built on top of Net::HTTP, which is not helpful. This led me to rfuzz, which is a fuzz testing tool for web apps. One nice feature of this tool is very light HTTP client library written atop the low-level TCPSocket object, and thus suffering none of Net::HTTP‘s dreadful performance. This client was made for a very different purpose, and had no streaming response abilities that I could find, but by subclassing the HttpClient class I was able to adapt it to my needs with minimal effort.

Now, I can stream a large file with a single-digit CPU hit, and I can control the buffer size to my liking.

The longer I’m a software developer, it seems the more I find myself irritated as the kind of low-level wheel-reinventing I’m constantly stuck with, for no better reason than software isn’t as far along as it should be by now. Maybe I was born too early after all.

UPDATE: In a shockingly uncharacteristic attempt to light a candle, I’ve done a detailed analysis of Ruby 1.8 HTTP client performance, complete with pretty pictures.

Comments (19) Trackbacks (0)
  1. What’s wrong with python ?

  2. There doesn’t have to be anything wrong with python for ruby to be better.

  3. Of course part of the reason Net::Http is crap is that people don’t fix it. Why not take a stab or make try and make a public push for improvement?

  4. Well, that’s just the thing. It’s more mature, has better libraries, better performance, and a better toolchain. I _should_ like Python. But I don’t. Idiomatic Ruby makes sense to me, while idiomatic Python does not. I guess it comes down to the flavor of syntactic sugar-water; I like Ruby’s taste better.

  5. Exactly. I tried to like Python. I really did. But if I’m coding on my own time, I damn sure want to enjoy the ride. Apart from the maddening limitations of Ruby’s standard libraries and performance and scalability, I have fun coding in Ruby.

  6. Yes, I’m certainly guilty of cursing the darkness when I could be lighting a candle. But I haven’t the time nor the inclination to undertake the development, testing, defense, and maintenance of a core library component. That, and cursing the darkness takes alot less energy.

  7. control your temper please. i’m grateful the majority of the ruby community takes more care in their discussion of issues like this instead of whining like a sixteen year old girl on their crappy blog. please pull the tampon out of your ass.

  8. *rotfl* There’s a difference between a curmudgeon and a sixteen year old girl. Curmudgeons in general will take care to use proper punctuation and sentence structure, and tend not to litter crappy blogs with petty insults better suited to a random YouTube comment thread. I am a curmudgeon. I can only speculate as to you.

  9. I’ll have to take a look at rfuzz. I usually use the Ruby bindings for Curl.

    I absolutely agree with you about Net::HTTP – it’s absolutely dreadful. In my opinion it should just be thrown out of the standard library completely. It’s presence there leads some people to think it’s actually worth using.

  10. There is a certain amount of difficulty in getting changes into ruby, particularly net/http, because the main developers are in Japan and most of us Western users don’t speak or write Japanese. (It’s all very well expecting them to know English, but that’s a big hurdle as well, so won’t get jumped any time soon.) As a result it’s difficult to get patches accepted. Not impossible, but it takes some pressure to get them through. I don’t know who is looking after the ruby quiz these days,
    but if this is a small change that could be fixed relatively easily, it might make a good quiz question, then people would
    effectively compete to provide a fix. I don’t have time to prod at this myself at the moment. Sorry.

  11. Hmm. I hadn’t even considered the Curl bindings; I’ll have a look at those too.

  12. That’s true. Unfortunately I don’t think the fix is as easy as a few lines. The buffer size could stand to be increased, but I read a thread somewhere where someone tried this and it still didn’t help. It needs to be rewritten, I’m afraid.

  13. I don’t see much about the server. There is some support for compression in Ruby now. It should support Deflate and gzip now, as the patches were accepted last December. But if the server is not doing anything about the headers Ruby supplies, and many servers don’t, then that benefit will be lacking. Your example is a text file and should compress reasonably well.

    For those who have said this could be better, but don’t have time to code this, are you in a position to suggest specifically
    what could be changed while keeping the published interface the same? Even if you don’t have time to cut and test code,
    a constructive suggestion would save someone (who has time to write code) from having to figure out what the right thing to do actually is. Since I don’t know the answer to that question, I’m not criticising, because sometimes squeaking is the only way to get the grease, but if you happen to know what grease to use then that’s more helpful.

  14. Actually, my test data was a zip file used to measure network performance at one of the data centers used by my hosting provider, so compression would’ve just added overhead here. In general, the root problem isn’t that transfers are slow, such that compression will ameliorate the problem. The issue is that transfers consume way more CPU than is appropriate for what should be an I/O bound operation.

    I read two posts on the list which covered parts of the solution: one was to grow the buffer from 1K to at least 16K (I would’ve gone higher myself), and the other was to eliminate the timeout helper method and use select() to implement timeouts. I’ve not tested this myself, but both areas seem to be sources of inefficiency, so I would suggest an intrepid developer start there.

    I also got the sense that some of this was different in 1.9, though it wasn’t clear what.

  15. Managed to misread the zip || mix it up with an example from one of the links, yes that would give overhead, and I take the point about CPU rather than speed.

    I don’t know when I’ll get chance to look at this, but if I get time I’ll have a look, but it will be a while. I might be out of my depth though, but hopefully there’s enough info here, and as it shows up on reddit

    http://www.reddit.com/r/ruby/comments/73zhf/absolutely_bullshit_ruby_http_client_situation/

    there’ll hopefully be more eyes on this problem. I have vague memories of comments about select not being very portable, but there may be other ways to fix this.

  16. Cool. I’ve actually taken your advice and started poking around the ruby sources. Ruby 1.9 grows the buffer to 16K, which will likely ameliorate the problem, but the timeout implementation is quite horrifying.

    If I get time this week I’ll try making various modifications to the library source code to see about improving performance.

  17. Do curmudgeons also talk in length about how a car sucks because the seats don’t accommodate the driver who wears a holster? You’re packing heat in your car. lol. I’m sorry, but you’re batshit insane. You used the word “ameliorate” twice in this thread. Your use of the English language reeks of Thesaurusitis. Focus less on sounding smart. People will respect you more.

  18. Since we’re discussing obscure vocab words which one only uses to sound smart, here are a couple more:

    * pedant – One who dwells incessantly upon insignificant minutia to the exclusion of relevant details. For example, counting occurrences of the word “ameliorate” is pedantic.
    * ad hominem – A latin phrase which roughly translates to ‘being a douchbag’. An ad hominem attack (such as, for example, “you’re batshit insane”) has no rhetorical value and as such is often a last resort of the ignorant.

    Don’t get me wrong. It’s awesome you actually took the time to read some of my blog postings before taking pot shots. You’re probably one of only a handful of people to read my civic SI holster compatibility complaint. You could just stand to be a bit less of an ignoramous.

  19. 100% yes to this.

    I’m a relative Ruby noob who just lost the better part of a day to dealing with some weird interaction between Net::HTTP and Facebook’s graph server. My browser’s fine, and curl is fine, but Net::HTTP coughs up a lung. People on the net have various suggestions for related problems, like swallowing EOFError and monkey-patching core libraries, which I felt embarrassed for even trying. Alas, they didn’t help.

    Anyhow, I just switched over to Typhoeus, which is built on top of the curl libraries, and it worked instantly. So if anybody ends up here looking for an alternative to Ruby’s Net::HTTP, use Typhoeus.


Leave a comment

No trackbacks yet.