Ubuntu Feisty Fawn, HighPoint RocketRaid 2220, and Satan
A while back I contorted myself to get a 64-bit FreeBSD 6.0 driver for my HighPoint RocketRaid 2220 RAID controller. Now that I have a 2TB ReadyNAS box, that old 1TB FreeBSD box is falling into disuse, so I thought I’d repurpose it as a dedicated Azureus download machine.
At first, I had hoped I could install Ubuntu Feisty Fawn directly on the RAID array, but I couldn’t even get the Ubuntu live CD to boot without a litany of read errors on sdc and sdb. I gave up on that, pulled one of the five 250GB drives from the array, and hooked it up to the on-board SATA controller, unplugged the RocketRaid, and installed Ubuntu.
Once that was done, I wanted to at least get enough RocketRaid support to create a RAID 0 volume consisting of the four remaining 250GB SATA drives. Long story short, here’s what I had to do:
- Compile a custom 2.6.22 kernel, explicitly excluding the
sata_mvdriver, which is extremely incomatible with the RocketRaid. Addingsata_mvto the blacklist, and using thebrokenmoduleskernel startup parameter were not sufficient; I had to literally compile this out of the kernel. - Download the latest HighPoint RocketRaid Linux driver source code. It may be possible to get the pre-compiled drivers to work on Feisty, but if so I don’t know how.
- Build the RocketRaid driver code per the instructions. The
make installstep failed towards the end, but it made it far enough to get thehptmv6driver built and working and loading at boot time.
Once that was done, it was time to create the RAID array. As I learned when I built a BSD box around this card, the RocketRaid 2220 is what is known as a FakeRAID card, meaning it has no hardware RAID circuitry; it’s just a SATA controller with some proprietary, buggy code that emulates the various RAID levels. So, I decided against using the HighPoint RAID code, and went into the HighPoint BIOS and created one JBOD device for each disk in the array. These devices showed up at /dev/sdb through /dev/sde. I used the software RAID HOWTO to build a /dev/md0 device consisting of these four disk devices, in RAID 0.
Now, I have a 1TB RAID 0 reiserfs partition upon which to stage my ill-gotten gains, before archiving them on my 2TB dedicated NAS box.
Next time, I’ll spend the $300 and get a real, supported RAID controller card.
More trouble in paradise: software RAID controllers can suck
Yesterday while I was at work, there was a brief power fluctuation in my townhouse. Since I’m still setting up aenea, she isn’t yet in my server closet, or hooked up to an UPS. So, predictably, she lost power.
This is somewhat bad, since the Highpoint RocketRaid 2220 SATA RAID controller that powers her 1TB RAID 5 disk array does not deal at all well with unorderly shutdowns, since the RAID logic is implemented in a software driver, not hardware.
Predictably, I suffered some file system damage. I now can’t boot, because /var seems sufficiently damaged to cause a panic in some ffs_whatever module. Thankfully it was /var and not, say, /usr, but nonetheless it sucks badly.
I’ve booted the FixIt shell on the FreeBSD 6.0 install disc, and loaded the hptmv6.ko kernel module from a USB floppy, so now I’m hoping I can fsck the problem away from this shell.
First, I’m discovering that a standard fsck in the FixIt shell doesn’t recognize the /var filesystem. fsck_ufs does the trick, but when I run it with fsck_ufs /dev/da0s1d it just outputs the file system errors and calls it a day; it doesn’t fix them.
Hmm, fsck doesn’t work because it’s looking in /sbin and /usr/sbin for the fsck_* executables, but in the FixIt environment they’re in /mnt2/usr/sbin. The FixIt shell is just flaky; sometimes I’ll run a command (ls, fsck, mount, man; it doesn’t matter what) and it hangs. Over on VTTY 2 (Alt-F2) I see about 15 timeout errors from acd0 before the shell finally comes back, only to hang again on my next command.
Fortunately, I’ve read on the lists that the first thing to try when a file system is fucked is to boot in single user mode (option 4 on the boot menu iirc). That boots find and gets me to a shell prompt.
I run
fsck -p /dev/da0s1d
Where -p is preen mode, which from the man page I gather checks for minor inconsistencies, but won’t handle major problems. All the list posts I see use this first.
From this I get:
/dev/ds0s1d: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY
I gather that’s bad. I found a few things on the list:
First, this frightful message advocating I use vi on the directory to remove invalid file entries. Um, no.
Next, this USENIX paper on FreeBSD soft updates, which explains what they’re for (to allow fsck to run whilst the file system is mounted, for speedier recovery), and when it doesn’t work (when the soft update snapshot is inconsistent, eg on power failure or crash).
So, with little help from the ‘net, I went ahead with:
fsck /dev/da0s1d
And got the UNEXPECTED SOFT UPDATE INCONSISTENCY, this time with a prompt: REMOVE? [yn]. I’m going to go with ‘yes’ and hope for the best…another error, this one UNREF FILE. The prompt is RECONNECT? [yn]. I’ll go with ‘yes’ again. Another ‘yes’ to the NO lost+found DIRECTORY CREATE?
A ton more UNREF FILE msgs; ‘yes’ each time.
FREE BLK COUNT(S) WRONG IN SUPERBLK. SALVAGE? [yn] Most definitely.
SUMMARY INFORMATION BAD. SALVAGE? [yn] Sure, go ahead.
BLKS MISSING IN BIT MAPS. SALVAGE? [yn] Yeah, if you want…
And then, as if nothing had happened, FILE SYSTEM MARKED CLEAN. Yay.
Now I do:
fsck -p
Do do a preening check on all the file systems. A few minor errors on /dev/da0s1f and /dev/da0s1e, but nothing fsck couldn’t handle on its own. Took a long time to scan the huge ~900GB partition…
Done now. I’ll exit this shell and proceed with the boot process, hoping for the best.
Voila! Booted fine.
So the moral(s) of the story are:
- When using a software RAID driver, you mustn’t let the power go out
- When using a BSD UFS file system, you mustn’t let the power go out
- When using UNIX in general, you mustn’t let the power go out
It’s hard for me to get used to this, as the bulk of my computer hours have been spent on Windows, where I’ve forced shutdowns countless times, and never had any serious file system damage. Needless to say, aenea is going on an UPS right now.
UPDATE: aenea sucks so much power she overloads the UPS I have on prospertine. I’ll have to move her into the server closet early, just so she’ll have an available UPS.
Preparing The Highpoint RocketRaid 222x Driver Source Code
- Go to the HighPoint RocketRaid 2220 download page, and download the FreeBSD open-source driver (at the bottom of the page). Do not get the FreeBSD driver; you must download the source code. As of this writing, the latest version was 1.01.
- Extract the tarball into
/usr/src/syson a machine with the FreeBSD 6.0-RELEASE amd64 sources installed. This will create/usr/src/sys/dev/hptmv6and/usr/src/sys/modules/hptmv6and the files therein. - Edit the file
/usr/src/sys/dev/hptmv6/osm_bsd.cinvior some similar text editor. This file references a field (d_maj) in a kernel data structure (cdevsw) that is no longer used in the 6.0 kernel (see What happened to the “d_maj” member of “struct cdevsw” in CURRENT? on the FreeBSD mailing list archive). Until this broken reference is removed, the driver will not compile.As of version 1.01 of the driver source, the offending line was 1117 in
osm_bsd.c. The line number could vary with subsequent versions, but the region around the problem looks like this:#if __FreeBSD_version>501000 .d_maj = MAJOR_AUTO, #else .d_maj = HPT_DEV_MAJOR, #endifThis entire
#if/#endifblock must be removed or commented out. I prefer to comment it out, replacing the above with this:/* commented out by anelson; d_maj deprecated in 6.0 #if __FreeBSD_version>501000 .d_maj = MAJOR_AUTO, #else .d_maj = HPT_DEV_MAJOR, #endif */The
/*and*/sequences denote a comment block in C; everything between the two sequences will be ignored by the compiler. - The
Makefilein/usr/src/sys/modules/hptmv6needs to be tweaked a bit as well. Replace its contents with this, below:HPTMV6= ${.CURDIR}/../../dev/hptmv6 .PATH: ${HPTMV6} KMOD = hptmv6 SRCS = opt_scsi.h opt_cam.h bus_if.h device_if.h pci_if.h os_bsd.h os_bsd.c osm_bsd.c hptmv6_config.c OBJS = hptmv6_lib.o .if $(MACHINE_ARCH) == "amd64" HPTMV6_O = amd64-elf.hptmv6_lib.o.uu .else HPTMV6_O = i386-elf.hptmv6_lib.o.uu .endif hptmv6_lib.o: ${HPTMV6}/$(HPTMV6_O) uudecode -p < ${HPTMV6}/$(HPTMV6_O) > ${.TARGET} .include <bsd.kmod.mk>Most of the changes are cosmetic, but one important change is to the path to the
.uufile; as downloaded the path is../../dev/hptmv6, which won’t work during kernel builds, since/usr/src/modules/hptmv6isn’t the current directory. Instead, the makefile above uses${.CURDIR}/../../dev/hptmv6, which will always resolve to the correctdev/hptmv6path. I borrowed this form from thehptmvdriver, for the RocketRaid 18xx series ATA RAID cards, which was already in the kernel source tree in/usr/src/sys/modules/hptmv, and has a very similar structure to that of thehptmv6driver. - Create a new kernel configuration based on GENERIC, and make the modifications as instructed in the
Readmefile that ships with the RocketRaid 222x driver tarball. Specifically, the changes are:- Copy the
GENERICfile in/usr/src/sys/i386/conf(fori386targets) or/usr/src/sys/amd64/conf(foramd64targets; the target I’m interested in), to a new file in the same directory, with a different name. This new name will be the name of the custom kernel; I useCUSTOM_AMD64_HPTMV6. - Find the following line in the new file you copied:
device "hptmv"...Right below that line, add this line:
device "hptmv6" #HighPoint RocketRAID 222x - For
amd64targets, edit/usr/src/sys/conf/files.amd64and append the following:hptmv6_lib.o optional hptmv6 \ dependency "$S/dev/hptmv6/amd64-elf.hptmv6_lib.o.uu" \ compile-with "uudecode < $S/dev/hptmv6/amd64-elf.hptmv6_lib.o.uu" \ no-implicit-rule dev/hptmv6/os_bsd.c optional hptmv6 dev/hptmv6/osm_bsd.c optional hptmv6 dev/hptmv6/hptmv6_config.c optional hptmv6For
i386targets, replaceamd64withi386in the file name and text.
- Copy the
- To build the kernel module separate from the static kernel, I had to edit
/usr/src/sys/modules/Makefileto addhptmv6everywhere I foundhptmv. Otherwise it’s just linked into the static kernel, and no.komodule is created.
Building a 64-bit 1TB SATA RAID file server with FreeBSD 6.0-amd64
This is an account of my varied experiences building aenea, a new server for my network whose primary feature is 1TB of SATA RAID5 storage. This experience was non-standard for a few reasons:
- Case selection was nuanced
- SuperMicro hot-swap drive cage was a hassle
- Shogun heat sink was a total farce
- Shogun heat sink almost obscures the PCI-Express slot
- HP LiteScribe CD sucks
- FreeBSD 6.0 doesn’t support the RocketRaid 2220
- Figuring out how to mod the RR222x driver to 1) build on FreeBSD 6 and, 2) cross-compile for amd64 from i386
- RR222x BIOS utility is shit; ‘background initialization’ means deferred initialization
- Without the RR222x RAID management utility, only way to trigger volume init is writing to the file system; I did this by doing a FreeBSD install. Catch is, the install doesn’t actually happen (the volume isn’t initialized yet), and you must leave the box powered on for hours until the disk lights stop flashing, indicating the init it done. Then you can reboot and do a normal install.
- Lack of floppy forced use of USB floppy drive, which is fine except it took the
/dev/dc0slot, making the array/dev/dc1, though it will be/dev/dc0if the USB floppy isn’t present. Thus, have to unplug floppy after hptmv6.ko kernel module is loaded. - After install, copy
hptmv6.koto/boot/kerneland modify/boot/defaults/loader.confby adding this line:hptmv6_load="YES" - Had to add
interface "em0" { send host-name="aenaea"; }to
/etc/dhclient.conf, as FreeBSD is the only OS I’ve used that doesn’t automatically send its hostname to a DHCP server. - RAID management tools come only in 32-bit version. From the FreeBSD-amd64 list it seems you can get i386 compatibility if you build from source, which obviously I will do.
- The array suddently went ‘Critical’ on me, with disk 3 (1-based) showing up as ‘Degraded’. Of course, no docs. Now the array is spinning like crazy; my hope is that it’s repairing the degraded disk. Without management tools, I can’t know. I’m wary of building from source with the array at Critical; not only will it be slow, but it can only prolong the recovery process.
- Once the array got back to ‘Normal’, I ran a
make buildworld, but the disk IO stalled during the build. It’s fairly clear that, due to amd64, version 6.0, or using FreeBSD, these volumes simply won’t work.I’ve no choice but to slink back to a mainstream OS. I’m leaning toward Fedora Core 4, even though it’s shit as a server system. The alternatives are Red Hat Enterprise Server (not an option) and SuSE Desktop Linux (as if FC4 wasn’t lame enough).
The
make buildworldhas since unstuck itself; maybe if I can get a kernel and userland with i386 support, I can run the CLI management tools and I won’t be so screwed. We’ll see. It’ll take hours to download FC4 binaries anyway, so there’s no harm in trying.
UPDATE: this is rather agonizing. FreeBSD 6.0 release works, more or less. I can’t run the i386 CLI management tool, to no surprise. I created a few symlinks to shared libraries to get it to start, only to get this:
Nov 25 18:06:20 aenea kernel: pid 699 (hptsvr-5.3), uid 0: exited on signal 11 (core dumped)
Not surprised; not only were these tools compiled for an earlier version of FBSD, but they’re running in the i386 compatibility layer.
The problem is there are little events that naw at me, making it impossible for me to delude myself into thinking the situation is tenable. For example, getting hung up during the buildworld. Also, doing a shutdown causes kernel panics during the unmount process.
I’m going to stress-test the disks a bit more with some port building, and see how that goes.