
On 5th February, only of my hard drives failed, and my computer started
to choke. My root partition was on the failed drive, so only programs
that were already in RAM could continue to run, though most of my
storage was on a RAID-5
across 4 320GB disks, so that was still intact. Fortunately, due to my
6GB of RAM, swap was not in use also, so this did not crash my PC, but
it did leave me in quite a bad position, not being able to run many of
the built in system tools, in particular the tools from “smartmontools”.
I did what I could to copy the important details over to Jem’s PC whilst
my system was still running, albeit crippled.
It turned out that, upon
rebooting of the system, everything was fine, and the disk worked, but
it didn’t half give me the willies! I vowed then to get my root drive
onto some form of redundant storage, and to have a
hot spare always on
hand. To that end I bought an
Icy Dock 5-in-3 SATA cage
and 2 500GB
SATAII drives.
Unfortunately, due to the amount of time I was spending working on
buzzspotr.com with
i-together, I was unable to actually use these
immediately. I finally got round to incorporating them into my system
last weekend. It was quite a challenge to do, so I thought I would
document it for future reference.
The first thing I did was delete as
much data as I could. The main things I focussed on were:
- Old MythTV programs
that I had recorded and seen, or that I was not going to watch
- My ripped DVD library (I rip my DVDs to make it easy to get them to
play from MythTV without having to get out of the sofa! I could
always re-rip them later)
- Old duplicate backups (for example, I had backups every 30 minutes
for Blog Friends, which summed to almost 50 GB! I removed all of
these except for those from 11:30pm each night)
- Duplicate files
- Caches
After removing all this data I reduced the “valuable” data on my
computer to somewhere around 650GB.
I decided that the best way to lay
out my computer data would be to have the following:
- First 0.5GB of each drive -
swap space and /boot
partition. I chose to not make the swap redundant as it is rarely
used (and I don’t mind if computer crashes if it HAS to!)
- Next 39.5GB of each drive -
RAID1+0
for /, totalling 117GB of fast redundant storage (theoretical
peak bandwidth: 1800mbps read, 900mbps write). High priority data
here - the root, my home directory and desktop, the mysql databases,
the webroots of apache, etc etc. Ultimately everything where speed
and redundancy are highest priority. This setup allows the loss of
any one, and possibly up to 3, drives, and has less probability of a
total fail than
RAID0+1.
I did *NOT* use the kernel raid10 driver.
- Then 80GB stripes over 5 disks of RAID5, with one hot spare (on the
6th disk) up to the capacity of the 320GB drives, which would all be
combined through LVM into one huge partition for lower priority data
- music/TV/etc. Note that RAID5 write speed is not great.
- The rest of the 500GB drives are currently unpartitioned, but I
might use them as overflow for MythTV, or as hot spares for the
RAID1+0
This seemed to me to be the best way to lay out my filesystem, but how
on earth could I move my current data over to the new system, and be
assured that it would still boot?
My previous setup was (4 320GB drives,
remember):
- First 0.5GB swap and /boot
- Then 30GB stripes for RAID5, up to the last 20GB. Combined with LVM,
two logical volumes.
- Last 20GB was used for /home on one drive, 64-bit / on another, and
a 32-bit / on another. The final drive was blank.
My first issue was how to boot into the Ubuntu LiveCD, and still have
volume management. I found the best way to do this was the following:
- Boot the Gutsy Gibbon LiveCD, remembering to set screen resolution
and keymap (for some reason, it crashes for me if I don’t…)
- Open up synaptic
- Edit the software sources (repositories) - tick all the boxes, and
all the updates boxes (gparted is broken on the LiveCD if you try
and use it on a completely raw (fresh from manufacturer) drive)
- Install all the updates (you can leave out obvious things like
OpenOffice.org if you want)
- Install mdadm and lvm2 packages
- (Optionally?) Run
modprobe raid0; modprobe raid1; modprobe raid5
- Then run
mdadm -A -s --no-degraded
- (Optionally?) Run
modprobe dm-mod
- Run
vgchange -ay
- Now you should have all your RAID and LVM partitions up and running
If you don’t understand the commands, I highly recommend that you read
their man pages to ensure that these are the right commands for you. You
CAN lose data if you mess this up! I always have to check each command 4
or 5 times before I run it when this much data is involved!
Once I had
done that, I had to take the plunge. First I checked each of the NEW
UNFORMATTED disks with a long read-write test (actually, I did this
before ever rebooting), by running badblocks -s -w /dev/sde
. This is a
destructive command, please be careful using it! It will erase any
data already on the drive.
Once I was convinced that the drives would
withstand the 2 days where they would be the critical data point, I
partitioned them. Both got the standard 0.5GB and 39.5GB partitions at
their fronts, and then sde got 3 80GB partitions, and the rest (220GB)
turned into another partition; whilst sdf got the rest (460GB) turned
into one large partition. I then copied everything over to these drives
(starting with sdf, and then working backwards through the partitions in
sde). I then had to take the jump and make my RAID1+0 (which was formed
of striping the pairs sda-sdd, sdb-sde, sdc-sdf).
It was at this point I
thought I would be clever. If I just deleted the partitions of sda, then
the RAID5 would still be holding the data, and I could make my severely
cripped RAID1+0 (really only a RAID0 in this idea) by combining sda5,
sde5 and sdf5. I could then copy the data over and check if it booted,
whilst still having lots of redundancy for my data. Unfortunately, the
system would not boot like this (I guessed it was because I had two md0s
- one for RAID1+0, and one from the old RAID5, though I was later proven
wrong), so I had to give up and take the risk. I deleted the partitions
from the other drives, and formed my RAID1+0, and tried to boot into it.
It still would not boot. I even chroot
ed into the new environment and
ran update-initramfs
and update-grub
, but still nothing.
At this
point I was a little frustrated, and spent a long time researching. In
the end I discovered that the initramfs was not being updated, and it
still contained the old /etc/mdadm/mdadm.conf. Upon deleting and
regenerating the initramfs, I could boot into the system. I quickly
rebooted into the LiveCD and did all the other necessary changes
(setting up the RAID5 across the 5 available disks (leaving the last
disk with the data on), copying the data from sdf onto the new RAID5,
updating the fstab, etc). I then rebooted into the system, and (not
surprisingly) had to make some quite considerably changes due to the
amount of data I had moved to new, “better”, locations. And finally,
just 2 days later(!), I had my ultra fast and acceptably redundant
system online.
I’m very glad that I took the time to do this, though I
still have not got round to formatting sdf and setting it up as hot
spares… it still has most of the old data on as a duplicate copy](http://www.zemanta.com/)