apocryph.org Notes to my future self

19Sep/076

Ubuntu Feisty Fawn, HighPoint RocketRaid 2220, and Satan

A while back I contorted myself to get a 64-bit FreeBSD 6.0 driver for my HighPoint RocketRaid 2220 RAID controller. Now that I have a 2TB ReadyNAS box, that old 1TB FreeBSD box is falling into disuse, so I thought I’d repurpose it as a dedicated Azureus download machine.

At first, I had hoped I could install Ubuntu Feisty Fawn directly on the RAID array, but I couldn’t even get the Ubuntu live CD to boot without a litany of read errors on sdc and sdb. I gave up on that, pulled one of the five 250GB drives from the array, and hooked it up to the on-board SATA controller, unplugged the RocketRaid, and installed Ubuntu.

Once that was done, I wanted to at least get enough RocketRaid support to create a RAID 0 volume consisting of the four remaining 250GB SATA drives. Long story short, here’s what I had to do:

  • Compile a custom 2.6.22 kernel, explicitly excluding the sata_mv driver, which is extremely incomatible with the RocketRaid. Adding sata_mv to the blacklist, and using the brokenmodules kernel startup parameter were not sufficient; I had to literally compile this out of the kernel.
  • Download the latest HighPoint RocketRaid Linux driver source code. It may be possible to get the pre-compiled drivers to work on Feisty, but if so I don’t know how.
  • Build the RocketRaid driver code per the instructions. The make install step failed towards the end, but it made it far enough to get the hptmv6 driver built and working and loading at boot time.

Once that was done, it was time to create the RAID array. As I learned when I built a BSD box around this card, the RocketRaid 2220 is what is known as a FakeRAID card, meaning it has no hardware RAID circuitry; it’s just a SATA controller with some proprietary, buggy code that emulates the various RAID levels. So, I decided against using the HighPoint RAID code, and went into the HighPoint BIOS and created one JBOD device for each disk in the array. These devices showed up at /dev/sdb through /dev/sde. I used the software RAID HOWTO to build a /dev/md0 device consisting of these four disk devices, in RAID 0.

Now, I have a 1TB RAID 0 reiserfs partition upon which to stage my ill-gotten gains, before archiving them on my 2TB dedicated NAS box.

Next time, I’ll spend the $300 and get a real, supported RAID controller card.

3Dec/050

More trouble in paradise: software RAID controllers can suck

Yesterday while I was at work, there was a brief power fluctuation in my townhouse. Since I’m still setting up aenea, she isn’t yet in my server closet, or hooked up to an UPS. So, predictably, she lost power.

This is somewhat bad, since the Highpoint RocketRaid 2220 SATA RAID controller that powers her 1TB RAID 5 disk array does not deal at all well with unorderly shutdowns, since the RAID logic is implemented in a software driver, not hardware.

Predictably, I suffered some file system damage. I now can’t boot, because /var seems sufficiently damaged to cause a panic in some ffs_whatever module. Thankfully it was /var and not, say, /usr, but nonetheless it sucks badly.

I’ve booted the FixIt shell on the FreeBSD 6.0 install disc, and loaded the hptmv6.ko kernel module from a USB floppy, so now I’m hoping I can fsck the problem away from this shell.

First, I’m discovering that a standard fsck in the FixIt shell doesn’t recognize the /var filesystem. fsck_ufs does the trick, but when I run it with fsck_ufs /dev/da0s1d it just outputs the file system errors and calls it a day; it doesn’t fix them.

Hmm, fsck doesn’t work because it’s looking in /sbin and /usr/sbin for the fsck_* executables, but in the FixIt environment they’re in /mnt2/usr/sbin. The FixIt shell is just flaky; sometimes I’ll run a command (ls, fsck, mount, man; it doesn’t matter what) and it hangs. Over on VTTY 2 (Alt-F2) I see about 15 timeout errors from acd0 before the shell finally comes back, only to hang again on my next command.

Fortunately, I’ve read on the lists that the first thing to try when a file system is fucked is to boot in single user mode (option 4 on the boot menu iirc). That boots find and gets me to a shell prompt.

I run

fsck -p /dev/da0s1d

Where -p is preen mode, which from the man page I gather checks for minor inconsistencies, but won’t handle major problems. All the list posts I see use this first.

From this I get:

/dev/ds0s1d: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

I gather that’s bad. I found a few things on the list:

First, this frightful message advocating I use vi on the directory to remove invalid file entries. Um, no.

Next, this USENIX paper on FreeBSD soft updates, which explains what they’re for (to allow fsck to run whilst the file system is mounted, for speedier recovery), and when it doesn’t work (when the soft update snapshot is inconsistent, eg on power failure or crash).

So, with little help from the ‘net, I went ahead with:

fsck /dev/da0s1d

And got the UNEXPECTED SOFT UPDATE INCONSISTENCY, this time with a prompt: REMOVE? [yn]. I’m going to go with ‘yes’ and hope for the best…another error, this one UNREF FILE. The prompt is RECONNECT? [yn]. I’ll go with ‘yes’ again. Another ‘yes’ to the NO lost+found DIRECTORY CREATE?

A ton more UNREF FILE msgs; ‘yes’ each time.

FREE BLK COUNT(S) WRONG IN SUPERBLK. SALVAGE? [yn] Most definitely.

SUMMARY INFORMATION BAD. SALVAGE? [yn] Sure, go ahead.

BLKS MISSING IN BIT MAPS. SALVAGE? [yn] Yeah, if you want…

And then, as if nothing had happened, FILE SYSTEM MARKED CLEAN. Yay.

Now I do:

fsck -p

Do do a preening check on all the file systems. A few minor errors on /dev/da0s1f and /dev/da0s1e, but nothing fsck couldn’t handle on its own. Took a long time to scan the huge ~900GB partition…

Done now. I’ll exit this shell and proceed with the boot process, hoping for the best.

Voila! Booted fine.

So the moral(s) of the story are:

  • When using a software RAID driver, you mustn’t let the power go out
  • When using a BSD UFS file system, you mustn’t let the power go out
  • When using UNIX in general, you mustn’t let the power go out

It’s hard for me to get used to this, as the bulk of my computer hours have been spent on Windows, where I’ve forced shutdowns countless times, and never had any serious file system damage. Needless to say, aenea is going on an UPS right now.

UPDATE: aenea sucks so much power she overloads the UPS I have on prospertine. I’ll have to move her into the server closet early, just so she’ll have an available UPS.

25Nov/050

Preparing The Highpoint RocketRaid 222x Driver Source Code

  1. Go to the HighPoint RocketRaid 2220 download page, and download the FreeBSD open-source driver (at the bottom of the page). Do not get the FreeBSD driver; you must download the source code. As of this writing, the latest version was 1.01.
  2. Extract the tarball into /usr/src/sys on a machine with the FreeBSD 6.0-RELEASE amd64 sources installed. This will create /usr/src/sys/dev/hptmv6 and /usr/src/sys/modules/hptmv6 and the files therein.
  3. Edit the file /usr/src/sys/dev/hptmv6/osm_bsd.c in vi or some similar text editor. This file references a field (d_maj) in a kernel data structure (cdevsw) that is no longer used in the 6.0 kernel (see What happened to the “d_maj” member of “struct cdevsw” in CURRENT? on the FreeBSD mailing list archive). Until this broken reference is removed, the driver will not compile.

    As of version 1.01 of the driver source, the offending line was 1117 in osm_bsd.c. The line number could vary with subsequent versions, but the region around the problem looks like this:

     #if __FreeBSD_version>501000
             .d_maj =        MAJOR_AUTO,
     #else
             .d_maj = HPT_DEV_MAJOR,
     #endif
    

    This entire #if/#endif block must be removed or commented out. I prefer to comment it out, replacing the above with this:

     /* commented out by anelson; d_maj deprecated in 6.0
     #if __FreeBSD_version>501000
             .d_maj =        MAJOR_AUTO,
     #else
             .d_maj = HPT_DEV_MAJOR,
     #endif
     */
    

    The /* and */ sequences denote a comment block in C; everything between the two sequences will be ignored by the compiler.

  4. The Makefile in /usr/src/sys/modules/hptmv6 needs to be tweaked a bit as well. Replace its contents with this, below:
     HPTMV6= ${.CURDIR}/../../dev/hptmv6
     .PATH: ${HPTMV6}
    
     KMOD = hptmv6
     SRCS = opt_scsi.h opt_cam.h bus_if.h device_if.h pci_if.h os_bsd.h os_bsd.c osm_bsd.c hptmv6_config.c
     OBJS = hptmv6_lib.o
    
     .if $(MACHINE_ARCH) == "amd64"
     HPTMV6_O = amd64-elf.hptmv6_lib.o.uu
     .else
     HPTMV6_O = i386-elf.hptmv6_lib.o.uu
     .endif
    
     hptmv6_lib.o: ${HPTMV6}/$(HPTMV6_O)
                 uudecode -p <  ${HPTMV6}/$(HPTMV6_O) > ${.TARGET}
    
     .include <bsd.kmod.mk>
    

    Most of the changes are cosmetic, but one important change is to the path to the .uu file; as downloaded the path is ../../dev/hptmv6, which won’t work during kernel builds, since /usr/src/modules/hptmv6 isn’t the current directory. Instead, the makefile above uses ${.CURDIR}/../../dev/hptmv6, which will always resolve to the correct dev/hptmv6 path. I borrowed this form from the hptmv driver, for the RocketRaid 18xx series ATA RAID cards, which was already in the kernel source tree in /usr/src/sys/modules/hptmv, and has a very similar structure to that of the hptmv6 driver.

  5. Create a new kernel configuration based on GENERIC, and make the modifications as instructed in the Readme file that ships with the RocketRaid 222x driver tarball. Specifically, the changes are:
    1. Copy the GENERIC file in /usr/src/sys/i386/conf (for i386 targets) or /usr/src/sys/amd64/conf (for amd64 targets; the target I’m interested in), to a new file in the same directory, with a different name. This new name will be the name of the custom kernel; I use CUSTOM_AMD64_HPTMV6.
    2. Find the following line in the new file you copied:
       device "hptmv"...
      

      Right below that line, add this line:

       device  "hptmv6"   #HighPoint RocketRAID 222x
      
    3. For amd64 targets, edit /usr/src/sys/conf/files.amd64 and append the following:
       hptmv6_lib.o optional    hptmv6  \
           dependency  "$S/dev/hptmv6/amd64-elf.hptmv6_lib.o.uu" \
           compile-with    "uudecode < $S/dev/hptmv6/amd64-elf.hptmv6_lib.o.uu" \
           no-implicit-rule
      
       dev/hptmv6/os_bsd.c      optional        hptmv6
       dev/hptmv6/osm_bsd.c     optional        hptmv6
       dev/hptmv6/hptmv6_config.c      optional        hptmv6
      

      For i386 targets, replace amd64 with i386 in the file name and text.

  6. To build the kernel module separate from the static kernel, I had to edit /usr/src/sys/modules/Makefile to add hptmv6 everywhere I found hptmv. Otherwise it’s just linked into the static kernel, and no .ko module is created.
25Nov/050

Building a 64-bit 1TB SATA RAID file server with FreeBSD 6.0-amd64

This is an account of my varied experiences building aenea, a new server for my network whose primary feature is 1TB of SATA RAID5 storage. This experience was non-standard for a few reasons:

  • Case selection was nuanced
  • SuperMicro hot-swap drive cage was a hassle
  • Shogun heat sink was a total farce
  • Shogun heat sink almost obscures the PCI-Express slot
  • HP LiteScribe CD sucks
  • FreeBSD 6.0 doesn’t support the RocketRaid 2220
  • Figuring out how to mod the RR222x driver to 1) build on FreeBSD 6 and, 2) cross-compile for amd64 from i386
  • RR222x BIOS utility is shit; ‘background initialization’ means deferred initialization
  • Without the RR222x RAID management utility, only way to trigger volume init is writing to the file system; I did this by doing a FreeBSD install. Catch is, the install doesn’t actually happen (the volume isn’t initialized yet), and you must leave the box powered on for hours until the disk lights stop flashing, indicating the init it done. Then you can reboot and do a normal install.
  • Lack of floppy forced use of USB floppy drive, which is fine except it took the /dev/dc0 slot, making the array /dev/dc1, though it will be /dev/dc0 if the USB floppy isn’t present. Thus, have to unplug floppy after hptmv6.ko kernel module is loaded.
  • After install, copy hptmv6.ko to /boot/kernel and modify /boot/defaults/loader.conf by adding this line:
    hptmv6_load="YES"
    
  • Had to add
    interface "em0" {
        send host-name="aenaea";
    }
    

    to /etc/dhclient.conf, as FreeBSD is the only OS I’ve used that doesn’t automatically send its hostname to a DHCP server.

  • RAID management tools come only in 32-bit version. From the FreeBSD-amd64 list it seems you can get i386 compatibility if you build from source, which obviously I will do.
  • The array suddently went ‘Critical’ on me, with disk 3 (1-based) showing up as ‘Degraded’. Of course, no docs. Now the array is spinning like crazy; my hope is that it’s repairing the degraded disk. Without management tools, I can’t know. I’m wary of building from source with the array at Critical; not only will it be slow, but it can only prolong the recovery process.
  • Once the array got back to ‘Normal’, I ran a make buildworld, but the disk IO stalled during the build. It’s fairly clear that, due to amd64, version 6.0, or using FreeBSD, these volumes simply won’t work.

    I’ve no choice but to slink back to a mainstream OS. I’m leaning toward Fedora Core 4, even though it’s shit as a server system. The alternatives are Red Hat Enterprise Server (not an option) and SuSE Desktop Linux (as if FC4 wasn’t lame enough).

    The make buildworld has since unstuck itself; maybe if I can get a kernel and userland with i386 support, I can run the CLI management tools and I won’t be so screwed. We’ll see. It’ll take hours to download FC4 binaries anyway, so there’s no harm in trying.

UPDATE: this is rather agonizing. FreeBSD 6.0 release works, more or less. I can’t run the i386 CLI management tool, to no surprise. I created a few symlinks to shared libraries to get it to start, only to get this:

 Nov 25 18:06:20 aenea kernel: pid 699 (hptsvr-5.3), uid 0: exited on signal 11 (core dumped)

Not surprised; not only were these tools compiled for an earlier version of FBSD, but they’re running in the i386 compatibility layer.

The problem is there are little events that naw at me, making it impossible for me to delude myself into thinking the situation is tenable. For example, getting hung up during the buildworld. Also, doing a shutdown causes kernel panics during the unmount process.

I’m going to stress-test the disks a bit more with some port building, and see how that goes.

Delicious Bookmarks

Recent Posts

Meta

Current Location