2 <title>The interface between kernel and image</title>
5 The initial boot image is supposed to load enough modules to let
6 the real root device be mounted cleanly. It starts up in a
7 <emphasis>very</emphasis> bare environment and it has to do tricky
8 stuff like juggling root filesystems; to pull that off successfully
9 it makes sense to take a close look at the environment that the
10 kernel creates for the image and what the kernel expects it to do.
11 This section contains raw design notes based on kernel 2.6.8.
15 The processing of the image starts even before the kernel is
16 activated. The bootloader, grub or lilo for example, reads two
17 files from the boot file system into ram: the kernel and image.
18 The bootloader somehow manages to set two variables in the kernel:
19 <code>initrd_start</code> and <code>initrd_end</code>; these variables
20 point to the copy of the image in ram. The bootloader now
21 hands over control to the kernel.
25 During setup, the kernel creates a special file system, rootfs.
26 This mostly reuses ramfs code, but there are a few twists: it can
27 never be mounted from userspace, there's only one copy, and it's not
28 mounted on top of anything else. The existence of rootfs means that
29 the rest of the kernel always can assume there's a place to mount
30 other file systems. It also is a place where temporary files can
31 be created during the boot sequence.
35 In <code>initramfs.c:populate_rootfs()</code>, there are two
36 possibilities. If the image looks like a cpio.gz file, it is
37 unpacked into rootfs. If the file <filename>/init</filename> is
38 among the files unpacked from the cpio file, the initramfs model
39 is used; otherwise we get a more complex interaction between kernel
40 and initrd, discussed in <xref linkend="initrd"/>.
44 <title>Booting with Initramfs</title>
46 If the image was a cpio file, and it contains a file
47 <filename>/init</filename>, the initram model is used.
48 The kernel does some basic setup and hands over control to
49 <filename>/init</filename>; it is then up to
50 <filename>/init</filename> to make a real root available and to
51 transfer control to the <filename>/sbin/init</filename> command
56 The tricky part is to do that in such a way that there
57 is no way for user processes to gain access to the rootfs
58 filesystem; and in such a way that rootfs remains empty and
59 hidden under the user root file system. This is best done
60 using some C code; <application>yaird</application> uses
61 <application>run_init</application>, a small tool based on
62 <application>klibc</application>.
64 # invoked as last command in /init, with no other processes running,
66 # exec run_init /newroot /sbin/init "$@"
68 # following after lots of sanity checks and not across mounts:
74 - exec /sbin/init "$@"
80 <simplesect id="initrd">
81 <title>Booting with initrd</title>
83 If the image was not a cpio file, the kernel copies the
84 initrd image from where ever the boot loader left it to
85 <filename>rootfs:/initrd.image</filename>, and frees the ram used
86 by the bootloader for the initrd image.
90 After reading initrd, the kernel does more setup to the point where
96 working CPU and memory management
102 working process management
108 compiled in drivers activated
114 a number of support processes such as ksoftirqd are created.
115 (These processes have the rootfs as root; they can get a new
116 root when the <code>pivot_root()</code> system call is used.)
122 something like a console. <code>Console_init()</code> is
123 called before PCI or USB probes, so expect only compiled in
124 console devices to work.
132 At this point, in <code>do_mounts.c:prepare_namespace()</code>,
133 the kernel looks for a root filesystem to mount. That root file
134 system can come from a number of places: NFS, a raid device, a plain
135 disk or an initrd. If it's an initrd, the sequence is as follows
136 (where devfs can fail if it's not compiled into the kernel)
139 - mount -t devfs devfs /dev
145 - mount -t devfs devfs /dev
151 Once that returns, in <code>init/main.c:init()</code>,
152 initialisation memory is freed and <filename>/sbin/init</filename>
153 is executed with <code>/dev/console</code> as file descriptor 0, 1
154 and 2. <filename>/sbin/init</filename> can be overruled with
155 an <code>init=/usr/bin/firefox</code> parameter passed to the
156 boot loader; if <filename>/sbin/init</filename> is not found,
157 <filename>/etc/init</filename> and a number of other fallbacks
158 are tried. We're in business.
162 The processing of initrd starts in
163 <code>do_mounts_initrd.c:initrd_load()</code>. It creates
164 <filename>rootfs:/dev/ram</filename>, then copies
165 <filename>rootfs:/initrd.image</filename> there and unlinks
166 <filename>rootfs:/initrd.image</filename>. Now we have the initrd
167 image in a block device, which is good for mounting. It calls
168 <code>handle_initrd()</code>, which does:
171 # make another block special file for ram0
172 - mknod /dev/root.old b 1 0
173 # try mounting initrd with all known file systems,
174 # optionally read-only
175 - mount -t xxx /dev/root.old /root
180 - mount -t devfs devfs /dev
181 - system ("/linuxrc");
186 - umount rootfs:/old/dev
193 So <filename>initrd:/linuxrc</filename> runs in an environment where
194 initrd is the root, with devfs mounted if available, and rootfs is
195 invisible (except that there are open file handles to directories
196 in rootfs, needed to change back to the old environment).
200 Now the idea seems to have been that <filename>/linuxrc</filename>
201 would mount the real root and <code>pivot_root</code> into it, then start
202 <filename>/sbin/init</filename>. Thus, linuxrc would never return.
203 However, <code>main.c:init()</code> does some usefull stuff only
204 after linuxrc returns: freeing init memory segments and starting numa
205 policy, so in eg Debian and Fedora, <filename>/linuxrc</filename>
206 will end, and <filename>/sbin/init</filename>
207 is started by <code>main.c:init()</code>.
211 After linuxrc returns, the variable <code>real_root_dev</code>
212 determines what happens. This variable can be read and written
213 via <filename>/proc/sys/kernel/real-root-dev</filename>. If it
214 is 0x0100 (the device number of <filename>/dev/ram0</filename>)
215 or something equivalent, <code>handle_initrd()</code> will change
216 directory to <filename>/old</filename> and return. If it is
217 something else, <code>handle_initrd()</code> will decode it, mount
218 it as root, mount initrd as <filename>/root/initrd</filename>,
219 and again start <filename>/sbin/init</filename>. (if mounting as
220 <filename>/root/initrd</filename> fails, the block device is freed.)
224 Remember <code>handle_initrd()</code> was called via
225 <code>load_initrd()</code> from <code>prepare_namespace()</code>,
226 and <code>prepare_namespace()</code> ends by chrooting into the
227 current directory: <filename>rootfs:/old</filename>.
231 Note that <filename>rootfs:/old</filename> was move-mounted
232 from '/' after <filename>/linuxrc</filename> returned.
233 When <filename>/linuxrc</filename> started, the root was
234 initrd, but <filename>/linuxrc</filename> may have done a
235 <code>pivot_root()</code>, replacing the root with a real root,
236 say <filename>/dev/hda1</filename>.
245 <filename>/linuxrc</filename> is started with initrd
252 There is working memory management, processes, compiled
253 in drivers, and stdin/out/err are connected to a console,
254 if the relevant drivers are compiled in.
260 Devfs may be mounted on <filename>/dev</filename>.
266 <filename>/linuxrc</filename> can <code>pivot_root</code>.
272 If you echo 0x0100 to
273 <filename>/proc/sys/kernel/real-root-dev</filename>,
274 the <code>pivot_root</code> will remain in effect after
275 <filename>/linuxrc</filename> ends.
281 After <filename>/linuxrc</filename> returns,
282 <filename>/dev</filename> may be unmounted and replaced
291 Thus a good strategy for <filename>/linuxrc</filename> is to
292 do as little as possible, and defer the real initialisation
293 to <filename>/sbin/init</filename> on the initrd; this
294 <filename>/sbin/init</filename> can then <code>pivot_root</code>
295 into the real root device.
299 mount -nt proc proc /proc
300 # root=$(cat proc/sys/kernel/real-root-dev)
301 echo 256 > proc/sys/kernel/real-root-dev
309 <title>Kernel command line parameters</title>
311 The kernel passes more information than just an initial file system
312 to the initrd or initramfs image; there also are the kernel boot
313 parameters. The bootloader passes these to the kernel, and the kernel
314 in turn passes them on via <filename>/proc/cmdline</filename>.
318 An old version of these parameters is documented in the
320 <!-- Sometimes I think docbook is overdoing this markup thing -->
321 <refentrytitle>bootparam</refentrytitle>
322 <manvolnum>7</manvolnum>
323 </citerefentry> manual page; more recent information is in the kernel
324 documentation file <citetitle>kernel-parameters.txt</citetitle>.
325 Mostly, these parameters are used to configure non-modular drivers,
326 and thus not very interesting to <application>yaird</application>.
327 Then there are parameters such as <code>noapic</code>, which are
328 interpreted by the kernel core and also irrelevant to
329 <application>yaird</application>.
330 Finally there are a few parameters which are used by the kernel
331 to determine how to mount the root file system.
335 Whether the initial image should emulate these options or ignore them
336 is open to discussion; you can make a case that the flexibility these
337 options offer has become irrelevant now that initrd/initramfs offers
338 far more fine grained control over the way in which the system
340 Support for these options is mostly a matter of tuning the
341 distribution specific templates, but it is possible that the
342 templates need an occassional hint from the planner.
343 To find out just how much "mostly" is, we'll try to implement
344 full support for these options and see where we run into
346 An inventarisation of relevant options.
355 These are options for the modular ide-core driver.
356 This could be supported by adding an attribute
357 "isIdeCore" to insmod actions, and expanding the ide
358 kernel options only for insmod actions where that
360 It seems cleaner to support the options from
361 <filename>/etc/modprobe.conf</filename>.
373 The first program to be started on the definitive root device,
374 default <filename>/sbin/init</filename>. Supported.
385 Mount the definitive root device read only,
386 so that it can be submitted to <application>fsck</application>.
387 Supported; this is the default behaviour.
398 Three guesses. Supported.
409 Which device (not) to use for software suspend.
421 The device to mount as root. This is a nasty one:
422 the planner by default only creates device nodes
423 that are needed to mount the root device, and even
424 if you were to put hotplug on the inital image
425 to create all possible device nodes, there's still
426 the matter of putting support for the proper file system
427 on the initial image.
428 We could make an option to
429 <application>yaird</application> to specify a list
430 of possible root devices and load the necessary
431 modules for all of them.
432 Unsupported until there's a clear need for it.
443 Flags to use while mounting root file system.
444 Implement together with root option.
455 File system type for root file system.
456 Implement together with root option.
467 For diskless booting.
468 Unclear whether we need this. NFS booting is desirable,
469 but I guess that will mostly be done under control of
470 DHCP. Unsupported for now.
481 More diskless booting.