apocryph.org Notes to my future self

19Aug/094

Getting FXRuby going with Ruby 1.9.1 and the new RubyInstaller

I’ve long resented the fact that the Ruby one-click installer for Windows uses Ruby binaries built with Visual C++ 6.0, which being ancient AND broken is very much teh suck. Thankfully, Luis Lavena is working on a new Windows installer for Ruby 1.9.x, the so-called RubyInstaller, which uses binaries built with MinGW. I wish he’d use the free version of Visual C++ 2008, since that IS the definitive compiler for Windows, but there are some licensing gotchas there to worry about, and anything is better than VC6.

Anyway, I downloaded the latest preview release of the installer, and the separate DevKit tarball, which I extracted into my Ruby install directory (which is _not_ c:\ruby; I hate dumping stuff in the root of the file system). I then wanted to get FXRuby, the Ruby bindings for the FOX GUI toolkit, working in this environment. Not easy. My experience follows.

Directories

I use a non-standard directory layout.  I’ll make reference to it in the steps below.  You obviously don’t have to use my layout.

c:\work – Where it all happens

c:\work\sourcecode – Where I extract source tarballs.  For example, zlib-1.2.3 is extracted to c:\work\sourcecode\zlib-1.2.3

c:\work\tools – Where I install the built libraries.  For example, zlib-1.2.3 is installed to c:\work\tools\zlib-1-2.3 (with make install creating subdirectories like lib and include).

c:\work\tools\ruby19 – Where I installed Ruby 1.9

Installing Ruby

I grabbed the latest preview1 installer here. At the time of this writing, it was ruby 1.9.1p129 (2009-05-12 revision 23412) [i386-mingw32].

I also grabbed the DevKit tarball from the same location. At the time, it was Development Kit 3.4.5r3 (MinGW 3.4.5 + MSYS 1.0.11)

I ran the installer and installed to c:\work\tools\ruby19. Note this non-standard install path; it bites me in the ass later.

Downloading FOX source

I grabbed the latest stable FOX source tarball, 1.6.36, and extracted it to c:\work\tools\fox-1.6.36. I have to get FOX built before I can even think about the FXruby bindings. The challenge is I need to build it using the MinGW environment that’s part of the Ruby DevKit I downloaded. I’ve never used MinGW, so this should be interesting.

Running Build Environment

Unless otherwise specified, all the subsequent commands are run from the Ruby build environment, which can be accessed by the ‘Start Command Prompt With Ruby’ shortcut.

Building FOX Dependencies

I put all the dependencies for FOX in C:\work\sourcecode.  I’ll build them one by one

zlib

sh configure --prefix=c:/work/tools/zlib-1.2.3
make
make install

jpeg

mkdir c:\work\tools\jpeg-6b\bin
mkdir c:\work\tools\jpeg-6b\man\man1
mkdir c:\work\tools\jpeg-6b\include
mkdir c:\work\tools\jpeg-6b\lib
sh configure --prefix=c:/work/tools/jpeg-6b --enable-shared --enable-static
make
make install (this fails with an error trying to install the library)
copy libjpeg.a c:\work\tools\jpeg-6b\lib\libjpeg.a

libpng

set CPPFLAGS=-Ic:/work/tools/zlib-1.2.3/include
set LDFLAGS=-Lc:/work/tools/zlib-1.2.3/lib
sh configure --prefix=c:/work/tools/libpng-1.2.16
make
make install

tiff

sh configure --prefix=c:/work/tools/tiff-3.8.2 --with-zlib-include-dir=c:/work/tools/zlib-1.2.3/include --with-zlib-lib-dir=c:/work/tools/zlib-1.2.3/lib --with-jpeg-include-dir=c:/work/tools/jpeg-6b/include --with-jpeg-lib-dir=c:/work/tools/jpeg-6b/lib
make
make install

Building FOX 1.6 itself

Brace yourself.  The FOX build is REALLY SLOW.  As in, start it at night and check on it in the morning slow.


set "CPPFLAGS=-Ic:/work/tools/jpeg-6b/include -Ic:/work/tools/libpng-1.2.16/include -Ic:/work/tools/tiff-3.8.2/include -Ic:/work/tools/zlib-1.2.3/include"
set "LDFLAGS=-Lc:/work/tools/jpeg-6b/lib -Lc:/work/tools/libpng-1.2.16/lib -Lc:/work/tools/tiff-3.8.2/lib -Lc:/work/tools/zlib-1.2.3/lib"
sh configure --enable-jpeg --enable-png --enable-tiff --enable-zlib
make
make install prefix=c:/work/tools/fox-1.6.36

Preparing FXRuby gem

Of course, there is no binary gem for FXRuby compiled for MinGW, so you must build it from source.  Unfortunately, the source gem as of this writing is not compatible with MinGW.  If you try to install it, you’ll get this:


In file included from c:/work/Tools/ruby19/include/ruby-1.9.1/ruby/missing.h:22,
from c:/work/Tools/ruby19/include/ruby-1.9.1/ruby/ruby.h:1125,
from c:/work/Tools/ruby19/include/ruby-1.9.1/ruby.h:32,
from librb.c:355:
c:/work/Tools/ruby19/devkit/gcc/3.4.5/bin/../lib/gcc/mingw32/3.4.5/../../../../include/sys/time.h:27: error: redefinition of `struct timezone'
c:/work/Tools/ruby19/devkit/gcc/3.4.5/bin/../lib/gcc/mingw32/3.4.5/../../../../include/sys/time.h:40: error: conflicting types for 'gettimeofday'
c:/work/Tools/ruby19/include/ruby-1.9.1/ruby/win32.h:248: error: previous declaration of 'gettimeofday' was here
c:/work/Tools/ruby19/devkit/gcc/3.4.5/bin/../lib/gcc/mingw32/3.4.5/../../../../include/sys/time.h:40: error: conflicting types for 'gettimeofday'
c:/work/Tools/ruby19/include/ruby-1.9.1/ruby/win32.h:248: error: previous declaration of 'gettimeofday' was here
make: *** [librb.o] Error 1

Fortunately, it’s very easy to patch.

First, download the FXRuby 1.6.19 source tarball (NOT source gem; as noted it won’t work).  I extracted it toc:\work\sourcecode\fxruby-1.6.19.

In that directory, edit the ext/fox16/extconf.rb file.  Find the do_cygwin_setup method (around line 80).  Find this line:

have_header("sys/time.h")

And comment it out, so it looks like this:

#have_header("sys/time.h")

Then find the line:

dir_config('fxscintilla', '/usr/local/include/fxscintilla', '/usr/local/lib')

and under it add the following:

# Need to add this so it can find dependent libs under Windows
dir_config('zlib')
dir_config('tiff')
dir_config('jpeg')
dir_config('png')

Next, edit the FXRuby.cpp file in the same directory. Find the line:

extern "C" void Init_fox16(void) {

and change it to:

extern "C" void __declspec(dllexport) Init_fox16(void) {

Then build the source gem from the fxruby-1.6.19 directory:

rake build_src_gem

This will create fxruby-1.6.19.gem. Install it now:


gem install fxruby-1.6.19.gem -- --with-fox-include=c:/work/tools/fox-1.6.36/include/fox-1.6 --with-fox-lib=c:/work/tools/fox-1.6.36/lib --with-zlib-include=c:/work/tools/zlib-1.2.3/include --with-zlib-lib=c:/work/tools/zlib-1.2.3/lib --with-tiff-include=c:/work/tools/tiff-3.8.2/include --with-tiff-lib=c:/work/tools/tiff-3.8.2/lib --with-png-include=c:/work/tools/libpng-1.2.16/include --with-png-lib=c:/work/tools/libpng-1.2.16/lib --with-jpeg-include=c:/work/tools/jpeg-6b/include --with-jpeg-lib=c:/work/tools/jpeg-6b/lib

The gem doesn’t copy the dependent files into the right place, so you have to do that yourself:

copy c:\work\tools\libpng-1.2.16\bin\libpng-3.dll c:\work\Tools\ruby19\bin
copy c:\work\tools\tiff-3.8.2\bin\libtiff-3.dll c:\work\Tools\ruby19\bin

Once the gem is installed, run one of the samples to make sure it worked:

ruby examples/hello.rb

If you’re lucky and the planets are in alignment, you’ll get a little window with “Hello World” in it. Yay.

1Dec/080

Building rcapdissector on Ubuntu 8.04

I’m trying to build my rcapdissector project under Linux for the first time. Starting with a fresh Ubuntu 8.04 install, I did:

  • sudo aptitude install bison to install yacc/bison
  • sudo aptitude install flex to install flex
  • sudo aptitude install libgtk2.0-dev to install the GTK+ 2.x headers and libraries
  • sudo aptitude install libpcap-dev to install the libpcap headers and libraries
  • sudo aptitude install libgnutls-dev to install the GNU TLS library (this also installs the GNU crypto library as a dependency)
  • sudo aptitude install ruby ruby1.8 ruby1.8-dev to install Ruby and the relevant headers
  • sudo aptitude install ruby-gnome2 for the Glib-aware mkmf
  • sudo aptitude install g++ to install the latest GNU C++ compiler
  • ./configure --prefix=/home/anelson/wireshark in the Wireshark 1.0.4 source tree to configure the Wireshark build
  • make in Wireshark tree to build Wireshark
  • Edit /etc/ld.so.conf so libwireshark.so and libwiretap.so are in the system search path. I did something like:
     > cat /etc/ld.so.conf.d/wireshark.conf
     /home/anelson/wireshark-1.0.4/epan/.libs
     /home/anelson/wireshark-1.0.4/wiretap/.libs
  • Run ldconfig to pick up the changes
  • ruby -w extconf.rb in the rcapdissector/ext directory to generate the Makefile for the extension. Note you’ll almost certainly have to provide library paths to libwireshark and libwiretap as well as the epan/epan.h header files. More on this later
  • make to build and make install to install
  • Run all_tests.rb in the test folder

As of now I’ve got this working with the latest SVN, except I’m getting a segfault when I run the tests, which I chalk up to something changed in Wireshark between 0.99.5 and 1.0.4. Also note that I slightly broke Windows compatibility to get this going, so the latest SVN won’t build under Windows without a bit of reverse contortion, but you can see from the SVN diffs what I did.

9Nov/086

A more in-depth analysis of Ruby HTTP client performance

As a follow-up to my previous article on Ruby HTTP client performance I’ve revamed my test rig and revised my tests to cover more variables, more implementations, and more Ruby versions.

This time I compare Ruby 1.8.6, 1.8.7, and 1.9.0, exclusively on CentOS 5 Linux. I compare the stock Net::HTTP, rfuzz, libcurl, eventmachine, right_http, and a number of Net::HTTP variations with slight performance tweaks. I evaluate clock time, CPU time, and CPU time over clock time, for five sites with varying network characteristics. As before, my test rig and results are available on my SVN repository for further experimentation.

For the conclusion and pretty pictures skip to the bottom. For the gory details read on.

Test Methodology

I have a simple HTTP client task, downloading a 10MB zip file from each of five data centers around the world (Seattle, Dallas, Chicago, Washington DC, and London). I’ve implemented that task using each of the HTTP client implementations I’m testing. I then use each implementation to download the file from each of the data centers using my CentOS 5 VPS box located in Future Hosting’s Dallas data center. I keep track of how much wall clock and CPU time is consumed by each implementation with the help of the Ruby benchmark library, and log that information to a CSV file.

I repeat this for ruby 1.8.6, 1.8.7, and 1.9.0. I then aggregate the results and generate pretty graphs.

Test implementations

I have tested the following HTTP client implementations:

  • stock_net_http – The Net::HTTP library that ships with Ruby. In 1.8.7 and beyond this library is a bit improved by a larger 16K read buffer size, but is otherwise unchanged between revisions
  • net_http_notimeout – A subclass of Net::HTTP that overrides the timeout logic to eliminate the timeout feature, which is cause for some shitty performance. This implementation also forces a 16K buffer size even under Ruby 1.8.6
  • net_http_select – A subclass of Net::HTTP that uses select() to implement timeouts instead of the rather inefficient stock timeout implementation. This, like all my custom HTTP impls, forces a 16K buffer size.
  • net_http_zerocopy – Another Net::HTTP subclass that has a modified read loop which uses the same pre-allocated String buffer for each read. This implementation also uses select() for timeout, and a hard-coded 16K buffer size.
  • net_http_zerocopy_sysread – A variation of net_http_zerocopy that uses readpartial with no timeout for socket reads, along with the existing preallocated buffer optimizations of its parent.
  • rfuzz – Uses a slightly modified version of the lightweight HTTP client in the rfuzz library. The rfuzz base implementation as well as this tweaked one do not implement timeouts. The rfuzz library is required or this implementation will be skipped.
  • right_http_connection – Uses the right_http_connection HTTP client implementation from the rightaws library. Annoyingly, right_http_connection works by monkey-patching Net::HTTP, which is why I had to modify my test rig to run each implementation test in a new instance of the Ruby interpreter. Bad form.
  • eventmachine – Uses the EventMachine HTTP client unmodified
  • libcurl Uses the Ruby bindings for the curl native HTTP library

Unfortunately, I could not get rev or revactor working on any of my Ruby versions, so I was unable to evaluate those implementations.

Try this at home

To reproduce my results, do the following:

  • Check out my SVN repository at http://svn.apocryph.org/svn/projects/rubyhttp/trunk revision 145.
  • Run the tests with Ruby 1.8.6. On my machine that’s the version in the path:
      ruby -v
      ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]
    

    which means the following command runs all available impls:

      ruby -w -rubygems test_all_impls.rb
    

    That will run all the available impls (some, like rfuzz, aren’t available if you haven’t installed the necessary gem), and log the results to ./results/(date), where (date) is the YYYY-MM-DD date.

  • Run the tests with Ruby 1.8.7. On my machine I had to build that from source:
       ~/ruby18/bin/ruby -v
       ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
    

    To run the tests you have to pass the Ruby command on the command line, since I couldn’t figure out how to programmatically determine the path to the Ruby interpreter. On my system that’s:

      ~/ruby18/bin/ruby -w -rubygems test_all_impls.rb "~/ruby18/bin/ruby -w -rubygems"
    

    Again, gems are required for some impls.

  • Run the tests with Ruby 1.9.0. Same deal as 1.8.7.
      ~/ruby19/bin/ruby -v
      ruby 1.9.0 (2008-10-06 revision 19702) [i686-linux]
    

    The command is similar:

      ~/ruby19/bin/ruby -w -rubygems test_all_impls.rb "~/ruby19/bin/ruby -w -rubygems"
    
  • After running all three, you’ll have a bunch of CSV files in the results subdirectory for today’s date. Here’s what I have:
    results/2008-11-09/ruby-1.8.6-i686-linux-eventmachine.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-libcurl.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_notimeout.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_select.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy_sysread.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-rfuzz.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-right_http_connection.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-stock_net_http.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-eventmachine.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-libcurl.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_notimeout.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_select.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy_sysread.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-rfuzz.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-right_http_connection.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-stock_net_http.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-eventmachine.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_notimeout.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_select.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy_sysread.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-right_http_connection.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-stock_net_http.csv
    
  • To generate aggregate data files suitable for analysis, cd into the results directory and run:

    ruby -w -rubygems ../../combine_csv.rb

    This will aggregate all the .csv files in the directory and format them into three aggregate files: clock_time.txt, cpu_percentage.txt, and total_cpu_time.txt. These files are formatted with columns for the server locations, rows for the HTTP implementation names, and values corresponding to the wall clock time, clock time over cpu time, and cpu time for each implementation and site. These are ready-made for generating the bar charts below in Excel. Note that you’ll need the FasterCSV library in order for this to work.

Results

Running all the tests is a pain in the ass. If you fetch my SVN repository, you’ll find the raw data files that I got from my tests under results/2008-11-09. Or, you can just read my analysis below.

Clock time

Wall clock time

As you can see above, each implementation takes more or less the same amount of wall clock time to download from a given site, with significant variations between sites. This is expected, as downloading a file over the Internet is a mostly network-bound operation. We don’t care so much how long it takes, as how much the CPU has to work while it’s happening. Which brings us to…

CPU Time

CPU time

Wow, stock 1.8.6 Net::HTTP really is teh suck! At least twice as much CPU usage as the nearest competitor. Going from a 1K read buffer to 16K in 1.8.7 made a big difference.

Going further down the list, you can see the Ruby 1.9.0 Net::HTTP implementations with zero copy reads and readpartial, and the notimeout variant, are the best performers, with rfuzz, libcurl, and eventmachine close behind. It’s encouraging that a pure-ruby impl like rfuzz can compete with a mostly native impl like libcurl.

It’s also important to note that each of the downloads, be it the super-fast Dallas or the slow London, hit the CPU the same way. This really jumps out when you look at CPU time over wall clock time:

Percent of wall clock time spent using the CPU

CPU time over wall clock time

Here you see the various transfer times for each site, but you can also see the widely ranging performance of the various HTTP implementations. No real surprises here; rfuzz, libcurl, and eventmachine are doing very well, while the 1.8.6 stock Net::HTTP continues to blow.

Conclusion

If you need an HTTP client in Ruby, DO NOT use the 1.8.6 Net::HTTP. The 1.8.7 version is considerably better, but libcurl, rfuzz, or eventmachine are all better still.

Within the Net::HTTP family, dropping the inefficient timeout implementation and optimizing the read code to reuse the same buffer are both pretty low-hanging optimizations which should be considered for a future Ruby release. For now, I’d recommend libcurl if you’re on Linux, or 1.8.7 Net::HTTP on Windows (since rfuzz doesn’t have timeouts, and eventmachine is hard to get going under Ruby on Windows).

UPDATE: It turns out there is a binary gem release of eventmachine 0.12.0 for Windows, so if you’re doing Windows development and need a performant HTTP client implementation, you should definitely look into eventmachine. Thanks to Abdul-Rahman Advany for the tip.

4Oct/0821

An analysis of Ruby 1.8.x HTTP client performance

Not too long ago I bitched about the performance of Ruby’s HTTP client. Some of the comments to that post prompted me to investigate this further, in the hopes of finding a more performant implementation solution.

The results of my analysis are in, and they’re…interesting, to say the least.

Summary

Ruby 1.8.6 (which still seems the dominant version among both Linux binary packages and the Windows One-Click Installer) uses a hard-coded 1K buffer size for HTTP reads, which leads to a ton of CPU usage during large HTTP downloads, even though the operation should be I/O bound and barely touch the CPU.

Ruby 1.8.7 includes a change described by the following entry in the changelog:

Mon Mar 19 11:39:29 2007  Minero Aoki  <aamine@loveruby.net>

    * lib/net/protocol.rb (rbuf_read): extend buffer size for speed.

After this change, Ruby’s HTTP implementation now uses a hard-coded 16K buffer, in the hopes of improving performance. Whether or not this actually improves things will become clear in my analysis later on.

In addition to Ruby’s built-in Net::HTTP client, I evaluated two alternatives: a version of the rfuzz HTTP client modified to support streaming GETs, and curb, the Ruby bindings for the native libcurl HTTP client library. My goal was to determine the best-case Ruby HTTP client performance as indicated by the performance of these two implementations, then munge Ruby’s stock implementation to try to approach the best-case performance.

rubyhttp

I wrote a tool, rubyhttp, to help me perform these tests. The code is freely available at my SVN repository at http://svn.apocryph.org/svn/projects/rubyhttp/trunk. To grab the code, do a svn co http://svn.apocryph.org/svn/projects/rubyhttp/trunk. The tests below were run with revision 127 of the code.

Test environment

I ran the tests on two machines: wyoh, a Windows XP x64-edition Core 2 Duo laptop with a FiOS internet connection, and lio, one of my FutureHosting VPS boxes running CentOS 5.

On wyoh I used the version of Ruby that comes with the latest one-click installer:

>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

On lio I tested two versions of Ruby. The first was installed by the ruby yum package:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]

As you can see, this is the same version and patchlevel as my Windows box. Once I discovered the 16k buffer enhancement in Ruby 1.8.7, I downloaded and built the latest 1.8.7 source tree. This is:

$ ~/ruby18/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]

Test data

My test code fetches the 10MB test files published by FutureHosting for measurement of the network performance at each of their data centers. Thus, my code retrieves data from Seattle, Dallas, Chicago, Washington DC, and London. lio is located in the very same Dallas data center, hence the crazy-high download speeds there, while wyoh is located in the suburbs of Washington DC in close geographical and network proximity to the DC datacenter.

HTTP variations

Each test run does an HTTP get from five different locations, using Net::HTTP and (on Linux only) rfuzz and curb as well. Neither rfuzz nor curb could be made to work on Windows, so the Windows runs use only Net::HTTP.

Most of the tests exercise some variation in the Net::HTTP implementation. The following variations are used:

  • stock – As it implies, the Net::HTTP implementation is unmodified from whatever ships with the version of Ruby being used
  • custom-16kbuffer – Modifies the buffer size from 1K to 16K. Note that Ruby 1.8.7 already includes this modification, so you’ll only see this run with Ruby 1.8.6 on Windows.
  • custom-16kbuffer-notimeout – Buffer size of 16K, and the timeout call is removed. This obviously isn’t a practical change, but it demonstrates the overhead of Ruby’s appalling timeout implementation
  • custom-16kbuffer-select – Buffer size of 16K, and the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka Akira on the ruby-talk list
  • custom-16kbuffer-selectwithsysread – Buffer size of 16K, the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka Akira on the ruby-talk list, and the read_nonblocking call after select indicates the presence of data to read is replaced by sysread
  • custom-64kbuffer-notimeout – Buffer size of 64K, and the timeout call is removed. This obviously isn’t a practical change, but it demonstrates the overhead of Ruby’s appalling timeout implementation
  • custom-64kbuffer-select – Buffer size of 64K, and the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka

All of these custom variations modify lib/ruby-1.8/net/protocol.rb, which contains the socket I/O functionality used by Net::HTTP. The rbuf_full method contains the actual socket read logic.

Data

Each run outputs the following information for each combination of HTTP URL and HTTP client implementation:

  • Site – name of the site (eg ‘seattle’, ‘washdc’, etc)
  • Impl – name of the HTTP implementation
  • KBytes Transferred
  • KBytes/second
  • Chunk count – The number of reads required to fetch the entire file
  • Mean chunk size – The average read size
  • Min chunk size
  • Max chunk size
  • User Time – The % of user time taken by ruby during the run
  • System Time – The % of system time taken by ruby during the run
  • Total CPU Time – The total % of CPU time taken by ruby during the run
  • Clock time – The amount of time spent downloading the file
  • % CPU usage – Defined as Total CPU Time / Clock Time * 100. The percentage of available CPU time taken by ruby

The raw data are available under SVN at results/linux/2008-10-4 and results/windows/2008-10-4. I used my combine_csv.rb tool to generate results/linux/2008-10-4/aggregate.csv from the individual test results. I ended up not using the Windows results as they complicated the graph and didn’t materially impact the conclusion.

I sucked aggregate.csv into Excel to do some munging.

Pretty Pictures

I uploaded the data to Swivel, thinking it would make it easy to analyze the data. It didn’t. I wanted to do a clustered bar graph, where each cluster corresponds to a site, and bars within that cluster reflect CPU usage for each implementation when downloading from that site. Swivel is way too limited for that.

The best I can so is this graph, which clusters by implementation and graphs CPU usage for each site; the opposite of what I wanted. You can play with the data yourself if you like.

Using good old fashion Excel, I generated this graph:

Ruby HTTP implementations performance

As you can see, the worst performers are the stock Net::HTTP implementations in both 1.8.6 and 1.8.7, though 1.8.6 is noticeably worse due to the 1K buffer size vs 16K for 1.8.7. The best performer is curb (libcurl bindings for Ruby), under with 1.8.6 and 1.8.7. The fastest Net::HTTP-based implementation uses a 16K buffer size and bypasses the timeout method, which is apparently quite inefficient. Using the non-blocking select to implement a timeout is slower than no timeout at all, but still considerably better than the stock impl. Finally, the 64k buffer size variants were actually worse performance-size than the 16K variants.

It’s also quite obvious that Dallas transfers took up the most CPU, while London took the least. What you can’t see from this graph, but would see in the raw data, is that Dallas transfers were crazy-fast (since these tests were run on the same network as the Dallas test file), so there was less wall-clock time spent on the test, thus the transfer was less I/O bound than others. For the same reason, London, by far the slowest transfer, uses the least amount of CPU. This does not mean that transfers from fast download sites are inherently less efficient. If instead of %CPU time I used the total CPU time column, this disparity would vanish.

Conclusion

Ruby’s Net::HTTP implementation blows. It’s a bit better in 1.8.7 with the new 16K buffer size, but the timeout implementation has got to go. Even with timeout eliminated, Net::HTTP is trounced by the pure-Ruby rfuzz and the native/Ruby blend curb, suggesting that timeout notwithstanding, there are other inefficiencies in Net::HTTP. Looking at the protocol.rb code, I’m struck by how painfully inefficient the implementation is with buffers. rfuzz and curb minimize buffer copies and my rfuzz streaming HTTP extension reuses the same buffer for multiple calls, while Net::HTTP is happily appending and sliceing away at arrays.

I think architecturally Net::HTTP can be saved, but it needs rewritten buffered I/O and an alternative to timeout, preferably in the form of select.

I’m going to try to work on the necessary changes, and will post whatever I come up with.

30Sep/082

"Gem::SourceIndex#search support for Regexp patterns is deprecated" from Rails after upgrading to RubyGems 1.3.0

In my attempts to get Ruby on Rails running on my FutureHosting VPS I ran into the following warning/error attempting to run the RoR script/generate controller command:

Gem::SourceIndex#search support for Regexp patterns is deprecated
/usr/lib/ruby/gems/1.8/gems/rails-2.1.1/lib/rails_generator/lookup.rb:211:in `each' is outdated

Apart from this Rails seems to work fine. This didn’t start until I upgraded to RubyGems 1.3.0, which (unbeknownst to me at the time) was just released a few days ago. From the release notes:

Other Changes Include:

  • lib/rubygems/source_index.rb
    • Deprecate options to ‘search’ other than Gem::Dependency instances and issue warning until November 2008.

I assume the above deprecation is causing the Gem::SourceIndex#search support for Regexp patterns is deprecated warning. So, is Rails using a deprecated Gem function? If so, why hasn’t Rails been updated accordingly? I’ve seen nothing about this on Google, and I’m running the latest Rails as of now, so I must assume I have to suffer in (relative) silence until an RoR update brings Rails into compliance. Since the message doesn’t seem to impact Rails’ functionality it’s a mere nuisance, but nuisance nonetheless.

If anyone has more info or a fix (apart from upgrading to the new RoR when it becomes available), please leave a comment.

27Sep/0818

Absolutely bullshit Ruby HTTP client situation

For one of my self-edification projects I’ve been trying to implement some very simple HTTP client code in Ruby, which I want to use to transfer hundreds or thousands of megabytes of data from an HTTP server. Since I need to transfer large blocks of data, I need a streaming HTTP API that allows me to read the data in small chunks so as not to exhaust available memory buffering reads. This is a very common idiom, and I was not surprised to find that Ruby’s built-in HTTP implementation, Net::HTTP, had just such a function.

Imagine my surprise, then, when I ran the following code:

#!/usr/bin/ruby -w

require 'net/http'
require 'logger'

BLOCK_SIZE = 1024*128 #128 K
REMOTE_URL = "http://wdc01.futurehosting.biz/test100.zip"
logger = Logger.new(STDOUT)

logger.debug("Parsing URL #{REMOTE_URL}")
url = URI.parse(REMOTE_URL)
logger.debug("Starting HTTP session with host #{url.host}, port #{url.port}")
Net::HTTP.start(url.host, url.port) do |http|
  logger.debug("Sending HTTP request for path #{url.path}")

  http.request_get(url.path) do |response|
    logger.debug("Processing response")
    logger.debug("Response headers: #{response.to_hash.to_s}")

    logger.debug("Rendering response")
    File.open('foo', 'wb') do |file|
      response.read_body do |body|
        logger.debug("Sending back #{body.length} bytes of response data")
        if (file.respond_to?(:syswrite))
          file.syswrite(body)
        else
          file.write(body)
        end
      end
    end
  end
end

What’s supposed to happen here is the 100MB test file is read in manageable chunks, thus memory and CPU very minimally used as the transfer proceeds. What actually happens is the CPU redlines, and the data are transferred in 1024 byte chunks. That’s right, there’s hard coded 1K chunk size in Ruby’s HTTP implementation, which means there’s tons of CPU overhead executing Ruby methods and socket system calls.

Don’t believe me? Think I’m some Ruby n00b, or maybe a Ruby hater? Read this thread. The combined might of the ruby-talk list was not sufficient to solve this guy’s problem. He came up with a hack that monkeyed with the built-in Net::HTTP classes to force a larger buffer size, but that hack broke in Ruby 1.8.5, and there’s no fucking way I’m going to go through that sort of contortion to make a craptastic library usable.

I continue to be amazed at the various areas in which Ruby is immature and vastly inferior to alternatives like Python. Still, every time I swear it off, I end up coming back after being reminded how much I hate using Python, and how refreshing Ruby’s dynamic duck-typing idioms are after a hard day of C# and C++. So, with fuck it no longer an option, I had to find an alternative.

There are other HTTP clients, but they built on top of Net::HTTP, which is not helpful. This led me to rfuzz, which is a fuzz testing tool for web apps. One nice feature of this tool is very light HTTP client library written atop the low-level TCPSocket object, and thus suffering none of Net::HTTP‘s dreadful performance. This client was made for a very different purpose, and had no streaming response abilities that I could find, but by subclassing the HttpClient class I was able to adapt it to my needs with minimal effort.

Now, I can stream a large file with a single-digit CPU hit, and I can control the buffer size to my liking.

The longer I’m a software developer, it seems the more I find myself irritated as the kind of low-level wheel-reinventing I’m constantly stuck with, for no better reason than software isn’t as far along as it should be by now. Maybe I was born too early after all.

UPDATE: In a shockingly uncharacteristic attempt to light a candle, I’ve done a detailed analysis of Ruby 1.8 HTTP client performance, complete with pretty pictures.

10Mar/080

(Trying to have) Fun with Markov

This past weekend I dusted off my prototypical Ruby implementation of Markov chains for the purposes of generating sentences that bear striking similarities to a corpus of sample text, but are in fact random nonesense text. My first exposure to this idea was the implementation in Kernigan and Pike’s Practice of Programming, but I’ve run across it a number of times since.

Most Markov text generation schemes I’ve run across are just for fun, like mixing the text characteristics of the Bible and Dr. Seuss or whatever. My idea was to use Markov text generation to generate memorable, secure passphrases which resemble a familiar text. I figured Markov chains would generate structurally sound sentences which would enable users to remember the sentence in terms of its structure rather than a random sequence of words, which is a common cognitive trick to remember long strings. I’m not the first to have this idea; at least passkool implements a variation of the same idea.

Markov chains aren’t hard to implement, and after a few hours I had a working implementation and some unit tests. However, I wanted something a little different: I wanted to compute the information theoretic entropy of each generated string, so users could ensure the strength of their passphrase was commensurate with the key or data being protected.

Shannon’s theory of information established a formalized definition of information entropy, which allows us to determine exactly how many bits of information are encoded in a particular variable given the probabilities of each of the variable’s possible values. This adapts very nicely to Markov chains, which are themselves really just states linked by state transitions of various probabilities. Using this formalization, I can use Markov chains to generate some text, and determine how many bits of information are encoded in the text.

The reason this is cool is that it relates directly to cryptography. Since humans tend to be unable to reliably memorize long binary cryptographic keys, we’ve taken to using passwords (or, hopefully, passphrases) which can be cryptographically converted into long binary cryptographic keys and are usually easier for humans to remember. The problem with this approach is that, if you’re not careful, you’ll kneecap your encryption algorithm by using a week passphrase.

For example, let’s say you need a 128-bit AES key, and rather than remember the key (or even worse, write it down!) you derive it from a passphrase which you can remember. Once derived, you use the key confident that even the US government probably can’t break your 128-bit encryption. However, it’s quite possible you’re actually using what is effectively 32-bit encryption, or possibly less, depending upon your passphrase. Is your passphrase something obvious like your name, the word “secret”, or a dictionary word? Then it’s not as secure as the 128-bit key it’s being used to derive. A random 128-bit key has 2^128 (a huge number, believe me!) possible values, but your shitty passphrase could be guessed within maybe a few million tries, which is easy to brute-force with modern computers.

Security professionals who understand this problem give us rules of thumb to relate the length and composition of a passphrase with a corresponding key strength (for example, the 1.2 bits per character rule), but this is only a rough approximation of security. If you pick a quote from Roget’s, the quote might be 100 characters long, and thus 120 bits strong, but that’s only true for an adversary who will try to guess your passphrase at random based on English letter order probabilities. If the adversary knows or has reason to guess you took the quote from a quote book, the security of the key is considerably lower. If the book has 10,000 quotes in it and you picked one at random, that’s -(1/10000 * LOG(1/10000, 2)) * 10000 bits of entropy, or about 14 bits. Any attacker worth his salt can try all quotes from the quote book in seconds.

This is where Markov text generation comes in. Since the text is generated from a series of state transition probabilities, it’s easy to compute the exact entropy level (that is, key strength) of each generated string. That’s what the measure_entropy_for_tokens method of my Markov class does. With this measurement, you can be confident that an attacker who can precisely duplicate your training corpus and Markov chain parameters nonetheless must brute-force the phrase with a level of difficulty comparable to brute-forcing a cryptographic key with similar entropy.

Once I had this implemented, I started to generate sentences from all sorts of sample text from the collected works of Rudyard Kipling to a wide assortment of sci-fi. Whether I generated vaguely-pronounceable nonsense words or whole sentences, the length of text required to reach 128 bits of entropy was alot more than I expected. This was made even worse by the occasional word strings which had zero entropy (meaning there was no chance any other word would be selected) due unique combinations of words in the source corpus.

Here are some examples I generated using a collection of children’s books from Project Gutenberg. I find kids books have simpler sentence structure and a smaller vocabulary, so they make for easier to remember passphrases:

Bill got hurt in their banishment (16.01220550168 bits)

But if you wish, distribute this etext electronically, or by disk, book or any little girls their lessons, and then ventured to move some time before morning the good Saint come to you,’ said Fergus, ‘with greetings from Concobar the King likewise to Fergus, and he wasn’t black (84.2709242256745 bits)

I should use nought save a half-dozen jealously guarded little precincts of good cheer (25.6622696871773 bits)

I did seek it (17.3989261460847 bits)

You see Lightfoot has no hair on him (24.9873805812463 bits)

Shadow is the child, most fair (20.460032160771 bits)

Christmas is going to Johnny, rubbed her head on one of your making such a hard white crust on the shore (51.4191467788532 bits)

The white snow fell softly, softly, and then he sometimes does great damage (36.4897175557749 bits)

WHO IS THERE? she said she’d marry me (11.7074031814505 bits)

Here’s Martha, mother! cried the two big caterpillars, a lizard, a small gold ring began to fly at intervals, like a drill, or as if you have already enjoyed them–without knowing or wondering why (61.6564060837357 bits)

Yes, Mammy, said Epaminondas (12.5248260288151 bits)

*These Etexts Prepared By Hundreds of Volunteers and financial support to provide volunteers with the Mouse family (20.1629955228745 bits)

Martha didn’t like to feel just as useful as you can, and very pretty song (36.138788709601 bits)

Note the entropy measurements next to each phrase. These entropy figures are specific to the exact corpus I used to train the model.

Imagine you need 128 bits of entropy; you’d need to combine at least two and possibly six or more of these phrases depending on the strength of each one. I’m not sure I could reliably remember such a passphrase, and I certainly couldn’t accommodate a great number of them for various accounts.

I think the lesson here is that Markov text generation is a good approach to passphrase generation, and that 128 bits of entropy is alot of information for the human brain to contain. I suspect there are some optimizations to be had to pack more entropy into a more memorable package, but the real limitation here is human memory capacity.

The code for the Markov implementation, tests, and generator tool is on my SVN repository here. I didn’t upload my corpus to SVN since it includes some copyrighted works; I suggest Project Gutenberg as a great source of public domain text files.

9Mar/080

Annoying problem with trailing commas in values with YAML.rb

I’ve run into a very annoying and seemingly serious gotcha in the YAML.rb implementation that ships with Ruby 1.8.6 patchlevel 0 i386-mswin32. The problem is surfaced when you attempt to serialize a Ruby string value with a trailing comma. The YAML output looks OK, but if you read the YAML back in the resulting value isn’t what you expect.

You can repro with this Ruby code:

require 'yaml'
require 'test/unit'

# Repro the bug in the YAML serializer
class BrokenYamlTest  < Test::Unit::TestCase
    def test_broken_yaml
        data =  {
            'foo' => {
                'bar' => ['baz,']
            }
        }

        File.open('testdata.yaml', 'w') do |file|
            YAML::dump(data, file)
        end

        data2 = File.open('testdata.yaml', 'r') do |file|
            YAML::load(file)
        end

        assert_equal(data['foo']['bar'][0],
            data2['foo']['bar'][0])
    end
end

When I run this code on my Windows machine with the aforementioned version of Ruby, I get this output:

    Loaded suite test_yaml_bug
    Started
    F
    Finished in 0.016 seconds.

      1) Failure:
    test_broken_yaml(BrokenYamlTest) [test_yaml_bug.rb:21]:
    <"baz,"> expected but was
    <"baz,\r">.

    1 tests, 1 assertions, 1 failures, 0 errors

As you can see, the baz, value is getting a carriage return (\r) character appended after it’s read back in from the YAML. Interestingly, if you serialize to/from a Ruby string instead of a file, this problem doesn’t surface.

I realize that the comma is a separator for list values in YAML, but I also don’t care. YAML.rb should provide round-trip consistency for something as simple as a trailing comma in a string literal.

1Jan/080

Simple Bugzilla migration script for FogBugz

I just convinced my company to migrated from BugZilla to FogBugz On Demand, the hosted version of Fog Creek Software’s case tracking tool, FogBugz. FogBugz is a vastly better tool, but that’s a post for another time.

When you buy FogBugz and host it yourself, it comes with a little script to import bugs from BugZilla, but for some reason the hosted version doesn’t support this, and Fog Creek want $2500 to do a custom migration. “To hell with that”, I thought to myself, and whipped up a Ruby migration script that uses the XML export feature of BugZilla to get the bugs out, and the FogBugz API 4.0 to get them in.

The script is written for my specific needs, and thus isn’t necessarily appropriate for anyone else, but it got the job done for me. The code is in my private SVN repository here, which you can browse via WebSVN here.

25Nov/070

Major Improvements to Ruby Wireshark Wrapper

It’s been a while since I last reported on the status of my Wireshark wrapper for Ruby. This past Thanksgiving weekend I put alot of time into it, and I’m pretty pleased with the progress.

I’ve made a few major changes to accommodate my long-term use for this wrapper, which is to index and analyze hundreds of gigabytes of captured network traffic.

First, I added the ability to dump a whole packet into YAML for storage as a blob. This was a compromise, in that I wanted to preserve the dissected structure of each packet, but obviously didn’t want to create a database schema to accommodate the dozens of fields one finds in a typical packet. I figured I’d save off each packet’s YAML representation in a BLOB, then retrieve and display the whole packet’s hierarchy in a GUI if needed. Any fields that would be involved in querying or reporting would obviously need to be hoisted into database fields, but that would be a small subset of each packet’s fields.

My initial YAML implementation used the Syck engine as exposed in the Ruby standard library’s YAML class. Unfortunately, this required I query each field’s name, value, display name, and display value, which causes the creation of five Ruby wrapper objects per field. The whole reason I modified the field wrapper to defer creation of Ruby objects is to avoid the huge performance hit this incurs.

So, using the slow-but-working Syck-based implementation as a baseline, I wrote a pure C++ YAML serializer specifically tuned for serializing field hierarchies and using the C++ stringstream to efficiently build the YAML string in memory. Based on my performance numbers, this results in a mean serialization time between 0.016 seconds, and effectively 0.000 seconds (in other words, faster than the measurement resolution of the Benchmark class, compared to 0.5 seconds on average with YAML. To be sure, this is not a reflection on YAML‘s serialization performance, but rather the significance of the performance gain I get from avoiding the creation of dozens or hundreds of Ruby objects per packet.

Once my C++ YAML serializer was producing YAML that parsed to a structure identical to the reference implementation based on YAML, I started to worry about large binary field values. As an example, I captured the traffic caused by downloading a 50K JPEG over HTTP. This capture contained a bunch of TCP packets, which Wireshark reassembled so the final TCP packet in the session included not just the data from the packet’s frame, but also the reassembled data consisting of the entire TCP payload for the HTTP response.

Obviously, serializing this out to YAML is somewhat inefficient. Instead, I reverse-engineered the Wireshark tvbuff_t stuff a bit more and figured out that each packet has a GSList of data_source objects, where each data_source has a name and a tvbuff_t. Normal packets have only one data_source, Frame, but the last TCP packet in a TCP segment also contains a Reassembled TCP data_source which contains the data from the entire segment. By exposing these separately, and modifying each Field object to return which data source contains its value as well as the offset into the data source where the value is located and the length in memory of the value, I can feasibly store the BLOB or BLOBs that make up each packet into the database as a binary object, and still reliably reassemble the packet or extract raw field values at will.

I think the next step is to build a basic data model for storing packets, and start loading it up then implementing some basic analysis like correlating IP addresses with hostnames, detecting interesting traffic, etc.

As usual, Commissar Richard Stallman requires I make my code available under the Marxist GPLv2; the SVN repository has the details. Note that the GPL doesn’t say anything about helping others getting shit building; it took me days to figure out the build process for Wireshark and Ruby, so I bid you good luck and godspeed.

Delicious Bookmarks

Recent Posts

Meta

Current Location