apocryph.org Notes to my future self

10Dec/081

Finally moved from DreamHost & Drupal to FutureHosting & WordPress

If you’re reading this, I’ve moved apocryph.org over to FutureHosting, and I have finally migrated away from Drupal and into WordPress.  Right now I’m running the stock skin and I haven’t done any configuration apart from migrate in my Drupal content and set up the Permalink Redirect plugin so the old Drupal-style links continue to work.  As time permits I’ll be porting my custom Drupal theme into WordPress and generally adapting the look and feel to my tastes.  Here’s hoping I don’t regret this…

Note that subdomains like svn.apocryph.org and wiki.apocryph.org haven’t been migrated over yet, so you’ll have to bear with me for a little while.

5Dec/086

I've said it before, and I'll say it again: Side-by-side assemblies are WAY worse than DLL Hell

I’ve bitched at length about Microsoft’s side-by-side scheme for ending the “DLL hell” that used to rule Windows, whereby multiple applications would each write their preferred DLL to Windows\System32, and break other apps that expected a different version. Microsoft set out to fix this problem with what I can only assume was a room full of PhD’s and no input from working programmers or sysadmins, since the resulting abomination takes a hard, non-obvious problem and replaces it with a very hard, virtually inscrutable problem.

Twice so far in the last 24 hours has this has bit me in the ass. First, I was trying to run some code on a test VM which I built on my development machine. Since I wanted to, you know, debug this code, I created a ‘debug’ build, which links the executable with the debug version of the Microsoft C runtimes. Every developer who’s ever tried to do this knows it won’t Just Work, since the debug runtimes won’t be present on the box. What I used to do was manually copy all the DebugCRT and DebugMFC files from my dev machine’s c:\windows\winsxs directory into the destination machines, which is exactly what Microsoft admonishes us not to do, but they don’t leave me much choice.

However, this VM I’m using this time is Windows Server 2008, whereon even the mighty Administrator doesn’t have write privs to winsxs; that’s limited to the Trusted Installer user. Sure, I could munge the privs, but this is supposed to be a test box, so I don’t want to munge it too far afield of what our customers’ configs will look like. So, I do what the WinSxs (say “win sucks”, because it does) guys suggest is to find the Debug_NoRedist folder in the Visual Studio 2008 install directory, copy the folders there under your architecture directly into the directory where your application is located, and voila!, through the magic of WinSxs, it will Just Work.

And, it would have. Except the Visual Studio team apparently doesn’t really get WinSxs either, because after I installed Visual Studio 2008 SP1, it updated the Debug_NoRedist folder with the new SP1 runtimes, which includes manifests that reflect the SP1 version of the debug CRT (9.0.30729.1), and not the old RTM version (9.0.21022.8). That’s the right call, except the binaries produced by SP1 have in their embedded assembly manifest a reference to the CRT version 9.0.21022.8. This works fine on a dev workstation where both the RTM and SP1 versions of the runtimes are installed, since there’s a policy file in the WinSxs directory that redirects requests for 9.0.21022.8 to the 9.0.30729.1 version. However, when you’re doing an isolated application that loads the CRTs from the current directory, there is no such redirect, so the app looks for 9.0.21022.8 and can only find 9.0.30729.1, so it eats shit and dies.

The fix? You’ll love it. Edit the Microsoft.VC90.DebugCRT.manifest file to change the version number from 9.0.30729.1 to 9.0.21022.8, and QED. Yes, that makes me feel dirty, but at least it works, which is more than I can say for the so-called “correct” way.

To whichever team came up with WinSxs: FUCK YOU ASSHOLES!

The second and more subtle way in which this bit me in the ass had to do with using CreateProcess to spawn an executable with a path like this: c:\foo\bar..\boo\baz.exe; that is, a .. somewhere in the path. That works fine normally, but if we’re using the debug runtimes, and there’s a Microsoft.VC90.DebugCRT directory in c:\foo\bar and also one in c:\foo\boo, then I get crashes the first time I allocate memory from a DLL and free it from the process. I can verify only one instance of the CRT DLLs are loaded into the process, and I can verify the DLL is compiled correctly, because when I switch from c:\foo\bar..\boo\baz.exe to c:\foo\boo\baz.exe with no other changes, it works fine.

Is this WinSxs’ fault or the CRT’s fault? Does it really matter? No: FUCK YOU ASSHOLES!

1Dec/080

Building rcapdissector on Ubuntu 8.04

I’m trying to build my rcapdissector project under Linux for the first time. Starting with a fresh Ubuntu 8.04 install, I did:

  • sudo aptitude install bison to install yacc/bison
  • sudo aptitude install flex to install flex
  • sudo aptitude install libgtk2.0-dev to install the GTK+ 2.x headers and libraries
  • sudo aptitude install libpcap-dev to install the libpcap headers and libraries
  • sudo aptitude install libgnutls-dev to install the GNU TLS library (this also installs the GNU crypto library as a dependency)
  • sudo aptitude install ruby ruby1.8 ruby1.8-dev to install Ruby and the relevant headers
  • sudo aptitude install ruby-gnome2 for the Glib-aware mkmf
  • sudo aptitude install g++ to install the latest GNU C++ compiler
  • ./configure --prefix=/home/anelson/wireshark in the Wireshark 1.0.4 source tree to configure the Wireshark build
  • make in Wireshark tree to build Wireshark
  • Edit /etc/ld.so.conf so libwireshark.so and libwiretap.so are in the system search path. I did something like:
     > cat /etc/ld.so.conf.d/wireshark.conf
     /home/anelson/wireshark-1.0.4/epan/.libs
     /home/anelson/wireshark-1.0.4/wiretap/.libs
  • Run ldconfig to pick up the changes
  • ruby -w extconf.rb in the rcapdissector/ext directory to generate the Makefile for the extension. Note you’ll almost certainly have to provide library paths to libwireshark and libwiretap as well as the epan/epan.h header files. More on this later
  • make to build and make install to install
  • Run all_tests.rb in the test folder

As of now I’ve got this working with the latest SVN, except I’m getting a segfault when I run the tests, which I chalk up to something changed in Wireshark between 0.99.5 and 1.0.4. Also note that I slightly broke Windows compatibility to get this going, so the latest SVN won’t build under Windows without a bit of reverse contortion, but you can see from the SVN diffs what I did.

30Nov/0812

Setting up Deluge on a headless Ubuntu seedbox with a Windows client

Recently I had to rebuild aenea, my dedicated seedbox, due to a botched upgrade to Ubuntu Hardy. The install went fine, being as it is Ubuntu, but when I installed the latest version of Azureus, my preferred BitTorrent client for the last five years or so, I ran into trouble. It would download fine, 1.5MB/s, for between one to thirty minutes, then all download and upload traffic would cease. No meaningful errors on the console, and the NAT test showed green consistently, but no traffic.

I poked around on the Azureus support forums and found this guy having the same problem. No one could offer any meaningful suggestions beyond the usual “is it plugged in?” and “is it turned on?” type stuff, so as a diagnostic measure I decided to run a different BitTorrent client. I rummaged around and quickly found Deluge, which is a cross-platform client written in Python.

I did a quick aptitude install deluge, which installed (what I later learned is an ancient) version 0.5.something. I fired it up, configured it for my ports and download location, and pointed it to the same torrents Azureus was choking on. Not surprisingly, Deluge downloaded them fine, with no eventual slowdown. Clearly, not a NAT problem.

Since I already had a working Deluge install, I started to consider dumping Azureus entirely. I’ve hated Azureus since it became Vuze and started bolting all sorts of crap on top of what should be a pretty simple piece of software, but I never had any motivation to switch since Azureus just worked. Now that it didn’t, and I had another client that did, I figured it would be a great opportunity to see if my needs could be met elsewhere.

Once I realized the package I was running was an ancient version, I went to the Deluge site to download the latest (1.0.5 at the time of this writing) in the form of a .deb package. Install was painless, and the new version was much nicer looking. Then, I set about to duplicate my Azureus config.

Download locations

I keep in-progress downloads in /usr/local/p2p/downloading, and completed downloads go in /usr/local/p2p/downloaded. This makes it easier to keep them straight, and avoid accidentally copying over a partially-downloaded torrent.

Blocklist

I used to use the bluetack level1 blocklist, but it’s MIA again so I downloaded a copy of the PeerGuardian P2P blocklist intead. I had to enable Deluge’s BLocklist plugin to enable blocklist functionality, then pointed it at the text file blocklist and it loaded it into memory.

Bandwidth

I let torrents use unlimited download bandwidth, and 400 KiB/s upload. My OpenBSD packet shaping rules take care of preventing bandwidth starvation.

Headless

I don’t initiate downloads from aenea, I do it from my laptop, so it won’t do to have to VNC into aenea and launch the UI. Fortunately, Deluge supports this. Unfortunately, the docs consist of a hard-to-follow thread on the forums. To get the Deluge daemon and the WebUI to run on system startup, I created an init.d script based on a forum thread.

The final script came from this post. The /etc/default/deluge-daemon file looks like:

# Configuration for /etc/init.d/deluge-daemon

# The init.d script will only run if this variable non-empty.
DELUGED_USER="anelson"

# Should we run at startup?
RUN_AT_STARTUP="YES"

and the /etc/init.d/deluge-daemon looks like:

    #!/bin/sh
    ### BEGIN INIT INFO
    # Provides:          deluge-daemon
    # Required-Start:    $local_fs $remote_fs
    # Required-Stop:     $local_fs $remote_fs
    # Should-Start:      $network
    # Should-Stop:       $network
    # Default-Start:     2 3 4 5
    # Default-Stop:      0 1 6
    # Short-Description: Daemonized version of deluge and webui.
    # Description:       Starts the deluge daemon with the user specified in
    #                    /etc/default/deluge-daemon.
    ### END INIT INFO

    # Author: Adolfo R. Brandes 

    PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
    DESC="Deluge Daemon"
    NAME1="deluged"
    NAME2="deluge"
    DAEMON1=/usr/bin/deluged
    DAEMON1_ARGS="-d"
    DAEMON2=/usr/bin/deluge
    DAEMON2_ARGS="-u web"
    PIDFILE1=/var/run/$NAME1.pid
    PIDFILE2=/var/run/$NAME2.pid
    PKGNAME=deluge-daemon
    SCRIPTNAME=/etc/init.d/$PKGNAME

    # Exit if the package is not installed
    [ -x "$DAEMON1" -a -x "$DAEMON2" ] || exit 0

    # Read configuration variable file if it is present
    [ -r /etc/default/$PKGNAME ] && . /etc/default/$PKGNAME

    # Load the VERBOSE setting and other rcS variables
    [ -f /etc/default/rcS ] && . /etc/default/rcS

    # Define LSB log_* functions.
    # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
    . /lib/lsb/init-functions

    if [ -z "$RUN_AT_STARTUP" -o "$RUN_AT_STARTUP" != "YES" ]
    then
       log_warning_msg "Not starting $PKGNAME, edit /etc/default/$PKGNAME to start it."
       exit 0
    fi

    if [ -z "$DELUGED_USER" ]
    then
        log_warning_msg "Not starting $PKGNAME, DELUGED_USER not set in /etc/default/$PKGNAME."
        exit 0
    fi

    #
    # Function that starts the daemon/service
    #
    do_start()
    {
       # Return
       #   0 if daemon has been started
       #   1 if daemon was already running
       #   2 if daemon could not be started
       start-stop-daemon --start --background --quiet --pidfile $PIDFILE1 --exec $DAEMON1 \
          --chuid $DELUGED_USER --user $DELUGED_USER --test > /dev/null
       RETVAL1="$?"
       start-stop-daemon --start --background --quiet --pidfile $PIDFILE2 --exec $DAEMON2 \
          --chuid $DELUGED_USER --user $DELUGED_USER --test > /dev/null
       RETVAL2="$?"
       [ "$RETVAL1" = "0" -a "$RETVAL2" = "0" ] || return 1

       start-stop-daemon --start --background --quiet --pidfile $PIDFILE1 --make-pidfile --exec $DAEMON1 \
          --chuid $DELUGED_USER --user $DELUGED_USER -- $DAEMON1_ARGS
       RETVAL1="$?"
            sleep 2
       start-stop-daemon --start --background --quiet --pidfile $PIDFILE2 --make-pidfile --exec $DAEMON2 \
          --chuid $DELUGED_USER --user $DELUGED_USER -- $DAEMON2_ARGS
       RETVAL2="$?"
       [ "$RETVAL1" = "0" -a "$RETVAL2" = "0" ] || return 2
    }

    #
    # Function that stops the daemon/service
    #
    do_stop()
    {
       # Return
       #   0 if daemon has been stopped
       #   1 if daemon was already stopped
       #   2 if daemon could not be stopped
       #   other if a failure occurred

       start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --user $DELUGED_USER --pidfile $PIDFILE2
       RETVAL2="$?"
       start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --user $DELUGED_USER --pidfile $PIDFILE1
       RETVAL1="$?"
       [ "$RETVAL1" = "2" -o "$RETVAL2" = "2" ] && return 2

       rm -f $PIDFILE1 $PIDFILE2

       [ "$RETVAL1" = "0" -a "$RETVAL2" = "0" ] && return 0 || return 1
    }

    case "$1" in
      start)
       [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME1"
       do_start
       case "$?" in
          0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
          2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
       esac
       ;;
      stop)
       [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME1"
       do_stop
       case "$?" in
          0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
          2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
       esac
       ;;
      restart|force-reload)
       log_daemon_msg "Restarting $DESC" "$NAME1"
       do_stop
       case "$?" in
         0|1)
          do_start
          case "$?" in
             0) log_end_msg 0 ;;
             1) log_end_msg 1 ;; # Old process is still running
             *) log_end_msg 1 ;; # Failed to start
          esac
          ;;
         *)
            # Failed to stop
          log_end_msg 1
          ;;
       esac
       ;;
      *)
       echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
       exit 3
       ;;
    esac

    :

Obviously you must make /etc/init.d/deluge-daemon executable by root, then make it load on startup:

sudo update-rc.d deluge-daemon defaults

You can either reboot, or do

sudo /etc/init.d/deluge-daemon start

to start things up. Now the daemon should be listening on port 58846, and the Web UI on port 8112 (by default protected by password deluge).

Client

In a pinch I can use the WebUI at http://aenea:8112/, but as rich as the UI is it’s really a pain in the ass to use with private trackers, since I have to download the torrent file from my logged-in browser session, then upload that torrent file to the web UI. Fortunately, Deluge offers another option.

The Deluge GUI is actually just a client, that talks to a daemon. By default it talks to the local daemon, but it can be made to talk to one elsewhere on the network as well. Unfortunately, the docs suck to the point of non-existence, so you have to rummage around the forums or read this post to find out how to do it.

First, I downloaded the 64-bit windows installer, and ran it on my Vista laptop. I then started up Deluge, and (here’s the non-obvious bit) went into Preferences, Interface, unchecked ‘Enable’ under ‘Classic Mode’, and restarted Deluge. When it came back up, it came up with the connection manager dialog. I removed the connection to the localhost daemon, added aenea, and configured it to always connect to this host. That way, whenever I launch Deluge on my laptop, it comes up connected to aenea.

You might think that’s it, but there’s a bit more yet to do.

File associations

My normal torrent workflow is to use Firefox on my Vista laptop to find torrents I want, then download them and let Firefox launch my torrent client automatically. With Deluge this is a bit tricky, because it is launched with a batch file, deluge.cmd, in the Program Files\Deluge folder. This batch file sets up some Python environment variables before invoking the scripts\deluge.exe file. However, if you tell Firefox to launch deluge.cmd, you’ll find that you get a rather obtuse error message popup:

Failed to run the program, Error:267, The directory name is invalid.

This is due to Firefox running as a 32-bit process, and thus thinking the Program Files directory is Program Files (x86). The solution is to create another batch file, which I call deluge32.cmd, with the following contents:

@echo off
rem Wrapper around deluge.cmd which will ensure the 64-bit command processor is used even if this batch file is invoked with the 32-bit
rem command processor
rem
rem Use with 32-bit firefox on 64-bit windows with 64-bit deluge.  Any other use is completely untested
set DELUGEDIR=C:\Program Files\Deluge

%windir%\sysnative\cmd.exe /c "%DELUGEDIR%\deluge.cmd" %*

Point Firefox to this batch file, and it works fine. I posted a bit more about this in the Deluge forums here

VNC

This isn’t really related to torrents at all, but I always configure my seedbox with resumable VNC sessions per this forum post. There’s a built-in desktop sharing thing in Gnome but you have to be logged in already and it shares the console session over VNC; this way VNC sees a separate session from the console, must like RDP on Windows.

Note that this tutorial is a bit old. I can’t get it working with Ubuntu 8.04, and apparently neither can anyone else. There’s a bug with the GNOME Settings Daemon crashing, and performance is shoot-me-in-the-head-I-cant-take-it-anymore slow. Dammit, sometimes I really hate computers.

10Nov/082

Finally getting 21st century entertainment gear

For the last several years I’ve not owned a TV, and the last gaming console I had was a SuperNES. I’ve been able to watch the shows I’m interested in on my computer, and I play games on my PC as well. However, lately I’ve had the urge for some casual gaming, but my Alienware laptop is too old (that is to say, 18 months old) to play modern games well, and I definitely don’t want to drop another $2k to upgrade. I’m also getting a little tired of not having a place where I can have ppl over to play or watch TV.

So, I finally decided to outfit my office with an HDTV and Xbox 360. After extensive research, I pulled the trigger on the Samsung LN40A550 40-Inch 1080p LCD HDTV, and an XBOX 360 Pro. I plan to use TVersity on my Windows Media Center machine in my bedroom to stream (and transcode, if necessary) my video content to the TV. I suppose I can also watch OTA HD channels, but I hate network TV so I don’t see that happening too often. I still refuse to pay for cable and/or satellite TV.

After spending nearly $1500 on what for years I’ve regarded as pointless excess, I’m feeling like a bit of a sellout, but I have only to watch a couple of Gear of War or Lego Star Wars gameplay videos, and I’ll bounce right back.

One interesting note about the TV. I was originally planning to get the 37-inch 1080p Samsung LN37A550, which was priced within $50 of the 37-inch 720p model LN37A450, but suddenly this afternoon the 37-inch 1080p model jumped in price. As of this writing, the Amazon (not Amazon Marketplace) price for the 37-inch LN37A530 is $989.98 with free white-glove shipping. However, as you can see in the below screenshot I took on my laptop after I noticed the price jump on my desktop, earlier today it was available for $859.99!

The LN37A530 for an amazing $859.99

I know Amazon fucks with prices all the time, but I felt pretty angry at this particular bait-and-switch. Despite this, I still ended up buying the 40 inch LN40A550 from Amazon, since their white-glove service and trustworthy customer support are worth the premium, plus the LN40A550 was only a few dollars more than the 37-inch LN37A550.

9Nov/086

A more in-depth analysis of Ruby HTTP client performance

As a follow-up to my previous article on Ruby HTTP client performance I’ve revamed my test rig and revised my tests to cover more variables, more implementations, and more Ruby versions.

This time I compare Ruby 1.8.6, 1.8.7, and 1.9.0, exclusively on CentOS 5 Linux. I compare the stock Net::HTTP, rfuzz, libcurl, eventmachine, right_http, and a number of Net::HTTP variations with slight performance tweaks. I evaluate clock time, CPU time, and CPU time over clock time, for five sites with varying network characteristics. As before, my test rig and results are available on my SVN repository for further experimentation.

For the conclusion and pretty pictures skip to the bottom. For the gory details read on.

Test Methodology

I have a simple HTTP client task, downloading a 10MB zip file from each of five data centers around the world (Seattle, Dallas, Chicago, Washington DC, and London). I’ve implemented that task using each of the HTTP client implementations I’m testing. I then use each implementation to download the file from each of the data centers using my CentOS 5 VPS box located in Future Hosting’s Dallas data center. I keep track of how much wall clock and CPU time is consumed by each implementation with the help of the Ruby benchmark library, and log that information to a CSV file.

I repeat this for ruby 1.8.6, 1.8.7, and 1.9.0. I then aggregate the results and generate pretty graphs.

Test implementations

I have tested the following HTTP client implementations:

  • stock_net_http – The Net::HTTP library that ships with Ruby. In 1.8.7 and beyond this library is a bit improved by a larger 16K read buffer size, but is otherwise unchanged between revisions
  • net_http_notimeout – A subclass of Net::HTTP that overrides the timeout logic to eliminate the timeout feature, which is cause for some shitty performance. This implementation also forces a 16K buffer size even under Ruby 1.8.6
  • net_http_select – A subclass of Net::HTTP that uses select() to implement timeouts instead of the rather inefficient stock timeout implementation. This, like all my custom HTTP impls, forces a 16K buffer size.
  • net_http_zerocopy – Another Net::HTTP subclass that has a modified read loop which uses the same pre-allocated String buffer for each read. This implementation also uses select() for timeout, and a hard-coded 16K buffer size.
  • net_http_zerocopy_sysread – A variation of net_http_zerocopy that uses readpartial with no timeout for socket reads, along with the existing preallocated buffer optimizations of its parent.
  • rfuzz – Uses a slightly modified version of the lightweight HTTP client in the rfuzz library. The rfuzz base implementation as well as this tweaked one do not implement timeouts. The rfuzz library is required or this implementation will be skipped.
  • right_http_connection – Uses the right_http_connection HTTP client implementation from the rightaws library. Annoyingly, right_http_connection works by monkey-patching Net::HTTP, which is why I had to modify my test rig to run each implementation test in a new instance of the Ruby interpreter. Bad form.
  • eventmachine – Uses the EventMachine HTTP client unmodified
  • libcurl Uses the Ruby bindings for the curl native HTTP library

Unfortunately, I could not get rev or revactor working on any of my Ruby versions, so I was unable to evaluate those implementations.

Try this at home

To reproduce my results, do the following:

  • Check out my SVN repository at http://svn.apocryph.org/svn/projects/rubyhttp/trunk revision 145.
  • Run the tests with Ruby 1.8.6. On my machine that’s the version in the path:
      ruby -v
      ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]
    

    which means the following command runs all available impls:

      ruby -w -rubygems test_all_impls.rb
    

    That will run all the available impls (some, like rfuzz, aren’t available if you haven’t installed the necessary gem), and log the results to ./results/(date), where (date) is the YYYY-MM-DD date.

  • Run the tests with Ruby 1.8.7. On my machine I had to build that from source:
       ~/ruby18/bin/ruby -v
       ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]
    

    To run the tests you have to pass the Ruby command on the command line, since I couldn’t figure out how to programmatically determine the path to the Ruby interpreter. On my system that’s:

      ~/ruby18/bin/ruby -w -rubygems test_all_impls.rb "~/ruby18/bin/ruby -w -rubygems"
    

    Again, gems are required for some impls.

  • Run the tests with Ruby 1.9.0. Same deal as 1.8.7.
      ~/ruby19/bin/ruby -v
      ruby 1.9.0 (2008-10-06 revision 19702) [i686-linux]
    

    The command is similar:

      ~/ruby19/bin/ruby -w -rubygems test_all_impls.rb "~/ruby19/bin/ruby -w -rubygems"
    
  • After running all three, you’ll have a bunch of CSV files in the results subdirectory for today’s date. Here’s what I have:
    results/2008-11-09/ruby-1.8.6-i686-linux-eventmachine.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-libcurl.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_notimeout.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_select.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-net_http_zerocopy_sysread.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-rfuzz.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-right_http_connection.csv
    results/2008-11-09/ruby-1.8.6-i686-linux-stock_net_http.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-eventmachine.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-libcurl.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_notimeout.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_select.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-net_http_zerocopy_sysread.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-rfuzz.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-right_http_connection.csv
    results/2008-11-09/ruby-1.8.7-i686-linux-stock_net_http.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-eventmachine.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_notimeout.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_select.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-net_http_zerocopy_sysread.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-right_http_connection.csv
    results/2008-11-09/ruby-1.9.0-i686-linux-stock_net_http.csv
    
  • To generate aggregate data files suitable for analysis, cd into the results directory and run:

    ruby -w -rubygems ../../combine_csv.rb

    This will aggregate all the .csv files in the directory and format them into three aggregate files: clock_time.txt, cpu_percentage.txt, and total_cpu_time.txt. These files are formatted with columns for the server locations, rows for the HTTP implementation names, and values corresponding to the wall clock time, clock time over cpu time, and cpu time for each implementation and site. These are ready-made for generating the bar charts below in Excel. Note that you’ll need the FasterCSV library in order for this to work.

Results

Running all the tests is a pain in the ass. If you fetch my SVN repository, you’ll find the raw data files that I got from my tests under results/2008-11-09. Or, you can just read my analysis below.

Clock time

Wall clock time

As you can see above, each implementation takes more or less the same amount of wall clock time to download from a given site, with significant variations between sites. This is expected, as downloading a file over the Internet is a mostly network-bound operation. We don’t care so much how long it takes, as how much the CPU has to work while it’s happening. Which brings us to…

CPU Time

CPU time

Wow, stock 1.8.6 Net::HTTP really is teh suck! At least twice as much CPU usage as the nearest competitor. Going from a 1K read buffer to 16K in 1.8.7 made a big difference.

Going further down the list, you can see the Ruby 1.9.0 Net::HTTP implementations with zero copy reads and readpartial, and the notimeout variant, are the best performers, with rfuzz, libcurl, and eventmachine close behind. It’s encouraging that a pure-ruby impl like rfuzz can compete with a mostly native impl like libcurl.

It’s also important to note that each of the downloads, be it the super-fast Dallas or the slow London, hit the CPU the same way. This really jumps out when you look at CPU time over wall clock time:

Percent of wall clock time spent using the CPU

CPU time over wall clock time

Here you see the various transfer times for each site, but you can also see the widely ranging performance of the various HTTP implementations. No real surprises here; rfuzz, libcurl, and eventmachine are doing very well, while the 1.8.6 stock Net::HTTP continues to blow.

Conclusion

If you need an HTTP client in Ruby, DO NOT use the 1.8.6 Net::HTTP. The 1.8.7 version is considerably better, but libcurl, rfuzz, or eventmachine are all better still.

Within the Net::HTTP family, dropping the inefficient timeout implementation and optimizing the read code to reuse the same buffer are both pretty low-hanging optimizations which should be considered for a future Ruby release. For now, I’d recommend libcurl if you’re on Linux, or 1.8.7 Net::HTTP on Windows (since rfuzz doesn’t have timeouts, and eventmachine is hard to get going under Ruby on Windows).

UPDATE: It turns out there is a binary gem release of eventmachine 0.12.0 for Windows, so if you’re doing Windows development and need a performant HTTP client implementation, you should definitely look into eventmachine. Thanks to Abdul-Rahman Advany for the tip.

23Oct/082

Creating DevExpress.NET toolbox icons for non-admin users

At work we use the DevExpress .NET widget toolkit to build flashy GUIs. I recently repaved my dev box, and now run as a non-admin user most of the time. This often causes problems with badly-behaved software, which apparently includes the DevExpress .NET toolkit.

If I run the installer as an admin, it creates convenient toolbox items in Visual Studio 2008 for dropping various DevExpress controls into the forms I build. However, my non-admin user doesn’t get these toolbox items. There’s a separate tool, the Toolbox Creator, which ships with DevExpress for the purpose of putting these back, but it doesn’t work without admin privs (see here).

DevExpress’s handy advise is:

The ToolBoxCreator creates toolbox icons only for the user for which it is launched. And, this user must have Administrator rights. This behavior is by design.
Besides, running the VS 2008 under a user without administrator rights leads to many issues

Well, fuck you. Maybe you can’t write code that behaves properly without admin privs, but that doesn’t mean I can’t.

I figured out a hack to get their shit working. Use Aaron Margosis’ MakeMeAdmin batch file to create a command window which is running as you, but with admin privs. cd into the DevExpress Tools directory, then run the target of the ToolboxCreator shortcut. Voila!

For the most part I’m happy with the DevExpress toolkit, but this incident really pissed me off. There’s no earthly reason why the Toolbox Creator needs admin rights, since the toolbox settings are per-user. It’s just lazy programming.

18Oct/082

Migrating gallery from Gallery2 to Flickr

I’m finally moving apocryph.org over to FutureHosting from DreamHost. I’ve put it off for so long because my 35+GB photo gallery will be a real pain to move over, and will use most of the 40GB of storage I have allotted on one of my two VPS accounts.

I really wanted to move my photo hosting to Picasa Web Albums, on account of the awesome new face detection/recognition feature they have in beta, but in the end I was swayed by value.

Here’s the price schedule for Picasa Web Albums:

  • 10 GB ($20.00 USD per year)
  • 40 GB ($75.00 USD per year)
  • 150 GB ($250.00 USD per year)
  • 400 GB ($500.00 USD per year)

Here’s the schedule for Flickr:

  • Unlimited ($24.95/yr)

Since I’m almost at 40GB, inside of a year I would be spending $250/yr for Picasa storage. Sorry, but face recognition coolness isn’t worth that sort of a premium.

I’m currently in the process of migrating my entire gallery over to Flickr using the Gallery2Flickr plug-in for Gallery 2. It’s slow going; I’ll have a separate post about the jigger-pokery required to make that work. Once I’m done, I’ll use my Gallery 2 install solely for hosting photos for my family, which is a small enough dataset that I can fit it on my VPS without difficulty.

13Oct/080

The end of an era: Vista on a dev box, and wintermute's retirement

This weekend marked the end of a long era, and the beginning of another, presumably shorter one.

First, I finally decommissioned wintermute, the old Sony VAIO PCV-90 that’s been running OpenBSD and serving as my home network’s firewall for the last five years or so. wintermute has been in more-or-less continuous operation since I bought it, the first PC I owned, from CompUSA in Rockville, MD back in 1996. wintermute served me well for many years, including occasional car trips to visit my first geek crush for Linux hacking sessions and twinkies. After I moved on to greener pastures (boromir, I think), wintermute was handed down to my siblings, who used it until I took it back for use as my firewall.

It’s quite remarkable that it has the original mobo, RAM, processor, power supply, and network card. The hard drive was long ago replaced, and the CD-ROM stopped working somewhere around 2000, but the machine itself has been solid. Now it’ll go out to pasture in my server closet.

I didn’t want to be rid of wintermute, but my FiOS connection is just too fast for it to keep up with. During heavy torrenting CPU usage was around 60% interrupt, and its 64MB of RAM weren’t enough to handle thousands of NAT state table entries and run DHCP and DNS for my network. boromir has now filled the role, with a screaming Pentium II 300MHz processor and 160MB of RAM, which is likely to suffice for quite some time.

On another, more pathetic note, I repaved wyoh, my primary laptop, to run Vista Ultimate x64. I held out as long as I could, but Windows XP x64 edition’s crap hardware support and non-existent game support made it harder and harder to live with. As much as I’d like to sell all my earthly possessions and switch to Ubuntu, the people that pay me like me to develop Microsoft software, which is something of a PITA on Ubuntu, and before you suggest I do my development inside a VM, fuck yourself and go try it for a day before you get all high and mighty.

I already hate Vista’s huge performance penalty, and the Aero eye candy doesn’t make up for it. I look forward to inexplicable lags and sputters as all the various anti-piracy tilt bits wobble about, lest I use my computer how I see fit to use it, without regard for the wishes of my betters. Hopefully, in the future, Microsoft will dissipate into irrelevance and I can get paid to run Linux like all the cool kids, but until then, you run what you brung.

4Oct/0821

An analysis of Ruby 1.8.x HTTP client performance

Not too long ago I bitched about the performance of Ruby’s HTTP client. Some of the comments to that post prompted me to investigate this further, in the hopes of finding a more performant implementation solution.

The results of my analysis are in, and they’re…interesting, to say the least.

Summary

Ruby 1.8.6 (which still seems the dominant version among both Linux binary packages and the Windows One-Click Installer) uses a hard-coded 1K buffer size for HTTP reads, which leads to a ton of CPU usage during large HTTP downloads, even though the operation should be I/O bound and barely touch the CPU.

Ruby 1.8.7 includes a change described by the following entry in the changelog:

Mon Mar 19 11:39:29 2007  Minero Aoki  <aamine@loveruby.net>

    * lib/net/protocol.rb (rbuf_read): extend buffer size for speed.

After this change, Ruby’s HTTP implementation now uses a hard-coded 16K buffer, in the hopes of improving performance. Whether or not this actually improves things will become clear in my analysis later on.

In addition to Ruby’s built-in Net::HTTP client, I evaluated two alternatives: a version of the rfuzz HTTP client modified to support streaming GETs, and curb, the Ruby bindings for the native libcurl HTTP client library. My goal was to determine the best-case Ruby HTTP client performance as indicated by the performance of these two implementations, then munge Ruby’s stock implementation to try to approach the best-case performance.

rubyhttp

I wrote a tool, rubyhttp, to help me perform these tests. The code is freely available at my SVN repository at http://svn.apocryph.org/svn/projects/rubyhttp/trunk. To grab the code, do a svn co http://svn.apocryph.org/svn/projects/rubyhttp/trunk. The tests below were run with revision 127 of the code.

Test environment

I ran the tests on two machines: wyoh, a Windows XP x64-edition Core 2 Duo laptop with a FiOS internet connection, and lio, one of my FutureHosting VPS boxes running CentOS 5.

On wyoh I used the version of Ruby that comes with the latest one-click installer:

>ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

On lio I tested two versions of Ruby. The first was installed by the ruby yum package:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]

As you can see, this is the same version and patchlevel as my Windows box. Once I discovered the 16k buffer enhancement in Ruby 1.8.7, I downloaded and built the latest 1.8.7 source tree. This is:

$ ~/ruby18/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-linux]

Test data

My test code fetches the 10MB test files published by FutureHosting for measurement of the network performance at each of their data centers. Thus, my code retrieves data from Seattle, Dallas, Chicago, Washington DC, and London. lio is located in the very same Dallas data center, hence the crazy-high download speeds there, while wyoh is located in the suburbs of Washington DC in close geographical and network proximity to the DC datacenter.

HTTP variations

Each test run does an HTTP get from five different locations, using Net::HTTP and (on Linux only) rfuzz and curb as well. Neither rfuzz nor curb could be made to work on Windows, so the Windows runs use only Net::HTTP.

Most of the tests exercise some variation in the Net::HTTP implementation. The following variations are used:

  • stock – As it implies, the Net::HTTP implementation is unmodified from whatever ships with the version of Ruby being used
  • custom-16kbuffer – Modifies the buffer size from 1K to 16K. Note that Ruby 1.8.7 already includes this modification, so you’ll only see this run with Ruby 1.8.6 on Windows.
  • custom-16kbuffer-notimeout – Buffer size of 16K, and the timeout call is removed. This obviously isn’t a practical change, but it demonstrates the overhead of Ruby’s appalling timeout implementation
  • custom-16kbuffer-select – Buffer size of 16K, and the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka Akira on the ruby-talk list
  • custom-16kbuffer-selectwithsysread – Buffer size of 16K, the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka Akira on the ruby-talk list, and the read_nonblocking call after select indicates the presence of data to read is replaced by sysread
  • custom-64kbuffer-notimeout – Buffer size of 64K, and the timeout call is removed. This obviously isn’t a practical change, but it demonstrates the overhead of Ruby’s appalling timeout implementation
  • custom-64kbuffer-select – Buffer size of 64K, and the timeout call is replaced with non-blocking I/O using select, as proposed by Tanaka

All of these custom variations modify lib/ruby-1.8/net/protocol.rb, which contains the socket I/O functionality used by Net::HTTP. The rbuf_full method contains the actual socket read logic.

Data

Each run outputs the following information for each combination of HTTP URL and HTTP client implementation:

  • Site – name of the site (eg ‘seattle’, ‘washdc’, etc)
  • Impl – name of the HTTP implementation
  • KBytes Transferred
  • KBytes/second
  • Chunk count – The number of reads required to fetch the entire file
  • Mean chunk size – The average read size
  • Min chunk size
  • Max chunk size
  • User Time – The % of user time taken by ruby during the run
  • System Time – The % of system time taken by ruby during the run
  • Total CPU Time – The total % of CPU time taken by ruby during the run
  • Clock time – The amount of time spent downloading the file
  • % CPU usage – Defined as Total CPU Time / Clock Time * 100. The percentage of available CPU time taken by ruby

The raw data are available under SVN at results/linux/2008-10-4 and results/windows/2008-10-4. I used my combine_csv.rb tool to generate results/linux/2008-10-4/aggregate.csv from the individual test results. I ended up not using the Windows results as they complicated the graph and didn’t materially impact the conclusion.

I sucked aggregate.csv into Excel to do some munging.

Pretty Pictures

I uploaded the data to Swivel, thinking it would make it easy to analyze the data. It didn’t. I wanted to do a clustered bar graph, where each cluster corresponds to a site, and bars within that cluster reflect CPU usage for each implementation when downloading from that site. Swivel is way too limited for that.

The best I can so is this graph, which clusters by implementation and graphs CPU usage for each site; the opposite of what I wanted. You can play with the data yourself if you like.

Using good old fashion Excel, I generated this graph:

Ruby HTTP implementations performance

As you can see, the worst performers are the stock Net::HTTP implementations in both 1.8.6 and 1.8.7, though 1.8.6 is noticeably worse due to the 1K buffer size vs 16K for 1.8.7. The best performer is curb (libcurl bindings for Ruby), under with 1.8.6 and 1.8.7. The fastest Net::HTTP-based implementation uses a 16K buffer size and bypasses the timeout method, which is apparently quite inefficient. Using the non-blocking select to implement a timeout is slower than no timeout at all, but still considerably better than the stock impl. Finally, the 64k buffer size variants were actually worse performance-size than the 16K variants.

It’s also quite obvious that Dallas transfers took up the most CPU, while London took the least. What you can’t see from this graph, but would see in the raw data, is that Dallas transfers were crazy-fast (since these tests were run on the same network as the Dallas test file), so there was less wall-clock time spent on the test, thus the transfer was less I/O bound than others. For the same reason, London, by far the slowest transfer, uses the least amount of CPU. This does not mean that transfers from fast download sites are inherently less efficient. If instead of %CPU time I used the total CPU time column, this disparity would vanish.

Conclusion

Ruby’s Net::HTTP implementation blows. It’s a bit better in 1.8.7 with the new 16K buffer size, but the timeout implementation has got to go. Even with timeout eliminated, Net::HTTP is trounced by the pure-Ruby rfuzz and the native/Ruby blend curb, suggesting that timeout notwithstanding, there are other inefficiencies in Net::HTTP. Looking at the protocol.rb code, I’m struck by how painfully inefficient the implementation is with buffers. rfuzz and curb minimize buffer copies and my rfuzz streaming HTTP extension reuses the same buffer for multiple calls, while Net::HTTP is happily appending and sliceing away at arrays.

I think architecturally Net::HTTP can be saved, but it needs rewritten buffered I/O and an alternative to timeout, preferably in the form of select.

I’m going to try to work on the necessary changes, and will post whatever I come up with.

Delicious Bookmarks

Recent Posts

Meta

Current Location