On 5th February, only of my hard drives failed, and my computer started to choke. My root partition was on the failed drive, so only programs that were already in RAM could continue to run, though most of my storage was on a RAID-5 across 4 320GB disks, so that was still intact. Fortunately, due to my 6GB of RAM, swap was not in use also, so this did not crash my PC, but it did leave me in quite a bad position, not being able to run many of the built in system tools, in particular the tools from “smartmontools”. I did what I could to copy the important details over to Jem’s PC whilst my system was still running, albeit crippled.
It turned out that, upon rebooting of the system, everything was fine, and the disk worked, but it didn’t half give me the willies! I vowed then to get my root drive onto some form of redundant storage, and to have a hot spare always on hand. To that end I bought an Icy Dock 5-in-3 SATA cage and 2 500GB SATAII drives. Unfortunately, due to the amount of time I was spending working on buzzspotr.com with i-together, I was unable to actually use these immediately. I finally got round to incorporating them into my system last weekend. It was quite a challenge to do, so I thought I would document it for future reference.
The first thing I did was delete as much data as I could. The main things I focussed on were:
- Old MythTV programs that I had recorded and seen, or that I was not going to watch
- My ripped DVD library (I rip my DVDs to make it easy to get them to play from MythTV without having to get out of the sofa! I could always re-rip them later)
- Old duplicate backups (for example, I had backups every 30 minutes for Blog Friends, which summed to almost 50 GB! I removed all of these except for those from 11:30pm each night)
- Duplicate files
After removing all this data I reduced the “valuable” data on my computer to somewhere around 650GB.
I decided that the best way to lay out my computer data would be to have the following:
- First 0.5GB of each drive - swap space and /boot partition. I chose to not make the swap redundant as it is rarely used (and I don’t mind if computer crashes if it HAS to!)
- Next 39.5GB of each drive - RAID1+0 for /, totalling 117GB of fast redundant storage (theoretical peak bandwidth: 1800mbps read, 900mbps write). High priority data here - the root, my home directory and desktop, the mysql databases, the webroots of apache, etc etc. Ultimately everything where speed and redundancy are highest priority. This setup allows the loss of any one, and possibly up to 3, drives, and has less probability of a total fail than RAID0+1. I did *NOT* use the kernel raid10 driver.
- Then 80GB stripes over 5 disks of RAID5, with one hot spare (on the
6th disk) up to the capacity of the 320GB drives, which would all be
combined through LVM into one huge partition for lower priority data
- music/TV/etc. Note that RAID5 write speed is not great.
- The rest of the 500GB drives are currently unpartitioned, but I might use them as overflow for MythTV, or as hot spares for the RAID1+0
This seemed to me to be the best way to lay out my filesystem, but how on earth could I move my current data over to the new system, and be assured that it would still boot?
My previous setup was (4 320GB drives, remember):
- First 0.5GB swap and /boot
- Then 30GB stripes for RAID5, up to the last 20GB. Combined with LVM, two logical volumes.
- Last 20GB was used for /home on one drive, 64-bit / on another, and a 32-bit / on another. The final drive was blank.
My first issue was how to boot into the Ubuntu LiveCD, and still have volume management. I found the best way to do this was the following:
- Boot the Gutsy Gibbon LiveCD, remembering to set screen resolution and keymap (for some reason, it crashes for me if I don’t…)
- Open up synaptic
- Edit the software sources (repositories) - tick all the boxes, and all the updates boxes (gparted is broken on the LiveCD if you try and use it on a completely raw (fresh from manufacturer) drive)
- Install all the updates (you can leave out obvious things like OpenOffice.org if you want)
- Install mdadm and lvm2 packages
- (Optionally?) Run
modprobe raid0; modprobe raid1; modprobe raid5
- Then run
mdadm -A -s --no-degraded
- (Optionally?) Run
- Now you should have all your RAID and LVM partitions up and running
If you don’t understand the commands, I highly recommend that you read their man pages to ensure that these are the right commands for you. You CAN lose data if you mess this up! I always have to check each command 4 or 5 times before I run it when this much data is involved!
Once I had
done that, I had to take the plunge. First I checked each of the NEW
UNFORMATTED disks with a long read-write test (actually, I did this
before ever rebooting), by running
badblocks -s -w /dev/sde. This is a
destructive command, please be careful using it! It will erase any
data already on the drive.
Once I was convinced that the drives would withstand the 2 days where they would be the critical data point, I partitioned them. Both got the standard 0.5GB and 39.5GB partitions at their fronts, and then sde got 3 80GB partitions, and the rest (220GB) turned into another partition; whilst sdf got the rest (460GB) turned into one large partition. I then copied everything over to these drives (starting with sdf, and then working backwards through the partitions in sde). I then had to take the jump and make my RAID1+0 (which was formed of striping the pairs sda-sdd, sdb-sde, sdc-sdf).
It was at this point I
thought I would be clever. If I just deleted the partitions of sda, then
the RAID5 would still be holding the data, and I could make my severely
cripped RAID1+0 (really only a RAID0 in this idea) by combining sda5,
sde5 and sdf5. I could then copy the data over and check if it booted,
whilst still having lots of redundancy for my data. Unfortunately, the
system would not boot like this (I guessed it was because I had two md0s
- one for RAID1+0, and one from the old RAID5, though I was later proven
wrong), so I had to give up and take the risk. I deleted the partitions
from the other drives, and formed my RAID1+0, and tried to boot into it.
It still would not boot. I even
chrooted into the new environment and
update-grub, but still nothing.
At this point I was a little frustrated, and spent a long time researching. In the end I discovered that the initramfs was not being updated, and it still contained the old /etc/mdadm/mdadm.conf. Upon deleting and regenerating the initramfs, I could boot into the system. I quickly rebooted into the LiveCD and did all the other necessary changes (setting up the RAID5 across the 5 available disks (leaving the last disk with the data on), copying the data from sdf onto the new RAID5, updating the fstab, etc). I then rebooted into the system, and (not surprisingly) had to make some quite considerably changes due to the amount of data I had moved to new, “better”, locations. And finally, just 2 days later(!), I had my ultra fast and acceptably redundant system online.
I’m very glad that I took the time to do this, though I still have not got round to formatting sdf and setting it up as hot spares… it still has most of the old data on as a duplicate copy