2 <title>Supporting Raid Devices</title>
5 This section discusses software raid devices from an initial boot
6 image perspective: how to get the root device up and running.
7 There are other aspects to consider, the bootloader for example:
8 if your root device is on a mirror for reliability, it would be
9 a disappointment if after the crash you still had a long downtime
10 because the MBR was only available on the crashed disk. Then there's
11 the issue of managing raid devices in combination with hotplugging:
12 once the system is operational, how should the raid devices that
13 the initial image left untouched be brought online?
17 Raid devices are managed via ioctls (mostly; there is something
18 called "autorun" in the kernel)
19 The interface from userland is simple: mknod a block device file,
20 send an ioctl to it specifying the devnos of the underlying block
21 devices and whether you'd like mirroring or striping, then send
22 a final ioctl to activate the device. This leaves the managing
23 application free to pick any unused device (minor) number and
24 has no assumptions about device file names.
28 Devices that take part in a raid set also have a "superblock",
29 a header at the end of the device that contains a uuid and indicates
30 how many drives and spares are supposed to take part in the raid set.
31 This can be used be the kernel to do consistency checking, it can also
32 be used by applications to scan for all disks belonging in a raid set,
33 even if one of the component drives is moved to another disk controller.
37 The fact that the superblock is at the end of a device has an obvious
38 advantage: if you somehow loose your raid software, the device
39 underlying a mirror can be mounted directly as a fallback measure.
43 If raid is compiled into the kernel rather than provided as a module,
44 the kernel uses superblocks at boot time to find raid sets and make
45 them available without user interaction. In this case the filename of
46 the created blockdevice is hardcoded: <filename>/dev/md\d</filename>.
47 This feature is intended for machines with root on a raid device
48 that don't use an initial boot image. This autorun feature is
49 also accessible via an ioctl, but it's not used in management
50 applications, since it won't work with an initial boot image and
51 it can be a nuisance if some daemon brought a raid set online just
52 after the administator took it off line for replacement.
56 Finally, by picking a different major device number for the raid device,
57 the raid device can be made partitionable without use of LVM.
61 There are at least three different raid management applications
62 for Linux: raidtools, the oldest; mdadm, more modern; and EVMS, a
63 suite of graphical and command line tools that manages not only raid
64 but also LVM, partitioning and file system formating. We'll only
65 consider mdadm for now. The use of mdadm is simple:
72 There's an option to create a new device from components,
73 building the superblock.
79 Another option assembles a raid device from components,
80 assuming the superblocks are already available.
86 Optionally, a configuration file can be used, specifying which
87 components make up a device, whether a device file should
88 be created or it is assumed to exist, whether it's stripe or
89 mirror, and the uuid. Also, a wildcard pattern can be given:
90 disks matching this pattern will be searched for superblocks.
96 Information given in the configuration file can be omitted
97 on the command line. If there's a wildcard, you don't even
98 have to specify the component devices of the raid device.
99 A typical command is <code>mdadm --assemble /dev/md-root
100 auto=md uuid=...</code>, which translates to "create
101 <filename>/dev/md-root</filename> with some unused minor number,
102 and put the components with matching uuid in it."
110 So far, raid devices look fairly simple to use; the complications
111 arise when you have to play nicely with all the other software
112 on the box. It turns out there are quite a lot of packages that
113 interact with raid devices:
121 When the md module is loaded, it registers 256 block devices
122 with <application>devfs</application>. These devices
123 are not actually allocated, they're just names set up to
124 allocate the underlying device when opened. These names in
125 <application>devfs</application> have no counterpart in sysfs.
131 When the LVM <application>vgchange</application> is started,
132 it opens all md devices to scan for headers, only to find the
133 raid devices have no underlying components and will return
134 no data. In this process, all these stillborn md devices get
135 registered with sysfs.
141 When <application>udevstart</application> is executed
142 at boot time, it walks over the sysfs tree and lets
143 <application>udev</application> create block devices files for
144 every block device it finds in sysfs. The name and permissions
145 of the created file are configurable, and there is a hook to
146 initialise SELinux access controls.
152 When <application>mdadm</application> is invoked with the auto
153 option, it will create a block device file with an unused
154 device number and put the requested raid volume under it.
155 The created device file is owned by whoever executed the
156 <application>mdadm</application> command, permissions are 0600
157 and there are no hooks for SELinux.
163 When the Debian installer builds a system with LVM and raid, the
164 raid volumes have names such as <filename>/dev/md0</filename>,
165 where there is an assumption about the device minor number in
166 the name of the file.
174 For the current Debian mkinitrd, this all works together in
175 a wonderful manner: devfs creates file names for raid devices,
176 LVM scans them with as side effect entering the devices in sysfs,
177 and after pivotroot <application>udevstart</application> triggers
178 udev into creating block device files with proper permissions and
179 SELinux hooks. Later in the processing of <filename>rcS.d</filename>,
180 <application>mdadm</application> will put a raid device under the
181 created special file. Convoluted but correct, except for the fact
182 that out of 256 generated raid device files, up to 255 are unused.
186 In <application>yaird</application>, we do not use devfs.
187 Instead, we do a <application>mknod</application> before the
188 <application>mdadm</application>, taking care to use the same
189 device number that's in use in the running kernel. We expect
190 <filename>mdadm.conf</filename> to contain an <code>auto=md</code>
191 option for any raid device files that need to be created.
192 This approach should work regardless of whether the fstab uses
193 <filename>/dev/md\d</filename> or a device number independent name.