git.sur5r.net Git - bacula/rescue/blob - rescue/linux/cdrom/yaird-0.0.5/doc/raid.xml

   1 <section id="raid">
   2   <title>Supporting Raid Devices</title>
   3
   4   <para>
   5     This section discusses software raid devices from an initial boot
   6     image perspective: how to get the root device up and running.
   7     There are other aspects to consider, the bootloader for example:
   8     if your root device is on a mirror for reliability, it would be
   9     a disappointment if after the crash you still had a long downtime
  10     because the MBR was only available on the crashed disk.  Then there's
  11     the issue of managing raid devices in combination with hotplugging:
  12     once the system is operational, how should the raid devices that
  13     the initial image left untouched be brought online?
  14   </para>
  15
  16   <para>
  17     Raid devices are managed via ioctls (mostly; there is something
  18     called "autorun" in the kernel)
  19     The interface from userland is simple: mknod a block device file,
  20     send an ioctl to it specifying the devnos of the underlying block
  21     devices and whether you'd like mirroring or striping, then send
  22     a final ioctl to activate the device.  This leaves the managing
  23     application free to pick any unused device (minor) number and
  24     has no assumptions about device file names.
  25   </para>
  26
  27   <para>
  28     Devices that take part in a raid set also have a "superblock",
  29     a header at the end of the device that contains a uuid and indicates
  30     how many drives and spares are supposed to take part in the raid set.
  31     This can be used be the kernel to do consistency checking, it can also
  32     be used by applications to scan for all disks belonging in a raid set,
  33     even if one of the component drives is moved to another disk controller.
  34   </para>
  35
  36   <para>
  37     The fact that the superblock is at the end of a device has an obvious
  38     advantage: if you somehow loose your raid software, the device
  39     underlying a mirror can be mounted directly as a fallback measure.
  40   </para>
  41
  42   <para>
  43     If raid is compiled into the kernel rather than provided as a module,
  44     the kernel uses superblocks at boot time to find raid sets and make
  45     them available without user interaction.  In this case the filename of
  46     the created blockdevice is hardcoded: <filename>/dev/md\d</filename>.
  47     This feature is intended for machines with root on a raid device
  48     that don't use an initial boot image.  This autorun feature is
  49     also accessible via an ioctl, but it's not used in management
  50     applications, since it won't work with an initial boot image and
  51     it can be a nuisance if some daemon brought a raid set online just
  52     after the administator took it off line for replacement.
  53   </para>
  54
  55   <para>
  56     Finally, by picking a different major device number for the raid device,
  57     the raid device can be made partitionable without use of LVM.
  58   </para>
  59
  60   <para>
  61     There are at least three different raid management applications
  62     for Linux: raidtools, the oldest; mdadm, more modern; and EVMS, a
  63     suite of graphical and command line tools that manages not only raid
  64     but also LVM, partitioning and file system formating.  We'll only
  65     consider mdadm for now.  The use of mdadm is simple:
  66   </para>
  67
  68   <para>
  69     <itemizedlist>
  70       <listitem>
  71         <para>
  72           There's an option to create a new device from components,
  73           building the superblock.
  74         </para>
  75       </listitem>
  76
  77       <listitem>
  78         <para>
  79           Another option assembles a raid device from components,
  80           assuming the superblocks are already available.
  81         </para>
  82       </listitem>
  83
  84       <listitem>
  85         <para>
  86           Optionally, a configuration file can be used, specifying which
  87           components make up a device, whether a device file should
  88           be created or it is assumed to exist, whether it's stripe or
  89           mirror, and the uuid.  Also, a wildcard pattern can be given:
  90           disks matching this pattern will be searched for superblocks.
  91         </para>
  92       </listitem>
  93
  94       <listitem>
  95         <para>
  96           Information given in the configuration file can be omitted
  97           on the command line.  If there's a wildcard, you don't even
  98           have to specify the component devices of the raid device.
  99           A typical command is <code>mdadm --assemble /dev/md-root
 100           auto=md uuid=...</code>, which translates to "create
 101           <filename>/dev/md-root</filename> with some unused minor number,
 102           and put the components with matching uuid in it."
 103         </para>
 104       </listitem>
 105
 106     </itemizedlist>
 107   </para>
 108
 109   <para>
 110     So far, raid devices look fairly simple to use; the complications
 111     arise when you have to play nicely with all the other software
 112     on the box.  It turns out there are quite a lot of packages that
 113     interact with raid devices:
 114   </para>
 115
 116   <para>
 117     <itemizedlist>
 118
 119       <listitem>
 120         <para>
 121           When the md module is loaded, it registers 256 block devices
 122           with <application>devfs</application>.  These devices
 123           are not actually allocated, they're just names set up to
 124           allocate the underlying device when opened.  These names in
 125           <application>devfs</application> have no counterpart in sysfs.
 126         </para>
 127       </listitem>
 128
 129       <listitem>
 130         <para>
 131           When the LVM <application>vgchange</application> is started,
 132           it opens all md devices to scan for headers, only to find the
 133           raid devices have no underlying components and will return
 134           no data.  In this process, all these stillborn md devices get
 135           registered with sysfs.
 136         </para>
 137       </listitem>
 138
 139       <listitem>
 140         <para>
 141           When <application>udevstart</application> is executed
 142           at boot time, it walks over the sysfs tree and lets
 143           <application>udev</application> create block devices files for
 144           every block device it finds in sysfs.  The name and permissions
 145           of the created file are configurable, and there is a hook to
 146           initialise SELinux access controls.
 147         </para>
 148       </listitem>
 149
 150       <listitem>
 151         <para>
 152           When <application>mdadm</application> is invoked with the auto
 153           option, it will create a block device file with an unused
 154           device number and put the requested raid volume under it.
 155           The created device file is owned by whoever executed the
 156           <application>mdadm</application> command, permissions are 0600
 157           and there are no hooks for SELinux.
 158         </para>
 159       </listitem>
 160
 161       <listitem>
 162         <para>
 163           When the Debian installer builds a system with LVM and raid, the
 164           raid volumes have names such as <filename>/dev/md0</filename>,
 165           where there is an assumption about the device minor number in
 166           the name of the file.
 167         </para>
 168       </listitem>
 169
 170     </itemizedlist>
 171   </para>
 172
 173   <para>
 174     For the current Debian mkinitrd, this all works together in
 175     a wonderful manner: devfs creates file names for raid devices,
 176     LVM scans them with as side effect entering the devices in sysfs,
 177     and after pivotroot <application>udevstart</application> triggers
 178     udev into creating block device files with proper permissions and
 179     SELinux hooks.  Later in the processing of <filename>rcS.d</filename>,
 180     <application>mdadm</application> will put a raid device under the
 181     created special file.  Convoluted but correct, except for the fact
 182     that out of 256 generated raid device files, up to 255 are unused.
 183   </para>
 184
 185   <para>
 186     In <application>yaird</application>, we do not use devfs.
 187     Instead, we do a <application>mknod</application> before the
 188     <application>mdadm</application>, taking care to use the same
 189     device number that's in use in the running kernel.  We expect
 190     <filename>mdadm.conf</filename> to contain an <code>auto=md</code>
 191     option for any raid device files that need to be created.
 192     This approach should work regardless of whether the fstab uses
 193     <filename>/dev/md\d</filename> or a device number independent name.
 194   </para>
 195
 196 </section>