git.sur5r.net Git - bacula/docs/blob - docs/manual/kaboom.tex

   1 %%
   2 %%
   3
   4 \chapter{What To Do When Bacula Crashes (Kaboom)}
   5 \label{KaboomChapter}
   6 \index[general]{Kaboom!What To Do When Bacula Crashes }
   7 \index[general]{What To Do When Bacula Crashes (Kaboom) }
   8
   9 If you are running on a Linux system, and you have a set of working
  10 configuration files, it is very unlikely that {\bf Bacula} will crash. As with
  11 all software, however, it is inevitable that someday, it may crash,
  12 particularly if you are running on another operating system or using a new or
  13 unusual feature.
  14
  15 This chapter explains what you should do if one of the three {\bf Bacula}
  16 daemons (Director, File, Storage) crashes.  When we speak of crashing, we
  17 mean that the daemon terminates abnormally because of an error.  There are
  18 many cases where Bacula detects errors (such as PIPE errors) and will fail
  19 a job. These are not considered crashes.  In addition, under certain
  20 conditions, Bacula will detect a fatal in the configuration, such as
  21 lack of permission to read/write the working directory. In that case,
  22 Bacula will force itself to crash with a SEGFAULT. However, before
  23 crashing, Bacula will normally display a message indicating why.
  24 For more details, please read on.
  25
  26
  27 \section{Traceback}
  28 \index[general]{Traceback}
  29
  30 Each of the three Bacula daemons has a built-in exception handler which, in
  31 case of an error, will attempt to produce a traceback. If successful the
  32 traceback will be emailed to you.
  33
  34 For this to work, you need to ensure that a few things are setup correctly on
  35 your system:
  36
  37 \begin{enumerate}
  38 \item You must have an installed copy of {\bf gdb} (the GNU debugger),  and it
  39    must be on {\bf Bacula's} path. On some systems such as Solaris, {\bf
  40    gdb} may be replaced by {\bf dbx}.
  41
  42 \item The Bacula installed script file {\bf btraceback} must  be in the same
  43    directory as the daemon which dies, and it must  be marked as executable.
  44
  45 \item The script file {\bf btraceback.gdb} must  have the correct  path to it
  46    specified in the {\bf btraceback} file.
  47
  48 \item You must have a {\bf mail} program which is on {\bf Bacula's}  path.
  49    By default, this {\bf mail} program is set to {\bf bsmtp}, so it must
  50    be correctly configured.
  51
  52 \item If you run either the Director or Storage daemon under a non-root
  53    userid, you will most likely need to modify the {\bf btraceback} file
  54    to do something like {\bf sudo} (raise to root priority) for the
  55    call to {\bf gdb} so that it has the proper permissions to debug
  56    Bacula.
  57 \end{enumerate}
  58
  59 If all the above conditions are met, the daemon that crashes will produce a
  60 traceback report and email it to you. If the above conditions are not true,
  61 you can either run the debugger by hand as described below, or you may be able
  62 to correct the problems by editing the {\bf btraceback} file. I recommend not
  63 spending too much time on trying to get the traceback to work as it can be
  64 very difficult.
  65
  66 The changes that might be needed are to add a correct path to the {\bf gdb}
  67 program, correct the path to the {\bf btraceback.gdb} file, change the {\bf
  68 mail} program or its path, or change your email address. The key line in the
  69 {\bf btraceback} file is:
  70
  71 \footnotesize
  72 \begin{verbatim}
  73 gdb -quiet -batch -x /home/kern/bacula/bin/btraceback.gdb \
  74      $1 $2 2>\&1 | bsmtp -s "Bacula traceback" your-address@xxx.com
  75 \end{verbatim}
  76 \normalsize
  77
  78 Since each daemon has the same traceback code, a single btraceback file is
  79 sufficient if you are running more than one daemon on a machine.
  80
  81 \section{Testing The Traceback}
  82 \index[general]{Traceback!Testing The }
  83 \index[general]{Testing The Traceback }
  84
  85 To "manually" test the traceback feature, you simply start {\bf Bacula} then
  86 obtain the {\bf PID} of the main daemon thread (there are multiple threads).
  87 The output produced here will look different depending on what OS and what
  88 version of the kernel you are running.
  89 Unfortunately, the output had to be split to fit on this page:
  90
  91 \footnotesize
  92 \begin{verbatim}
  93 [kern@rufus kern]$ ps fax --columns 132 | grep bacula-dir
  94  2103 ?        S      0:00 /home/kern/bacula/k/src/dird/bacula-dir -c
  95                                        /home/kern/bacula/k/src/dird/dird.conf
  96  2104 ?        S      0:00  \_ /home/kern/bacula/k/src/dird/bacula-dir -c
  97                                        /home/kern/bacula/k/src/dird/dird.conf
  98  2106 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
  99                                        /home/kern/bacula/k/src/dird/dird.conf
 100  2105 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
 101                                        /home/kern/bacula/k/src/dird/dird.conf
 102 \end{verbatim}
 103 \normalsize
 104
 105 which in this case is 2103. Then while Bacula is running, you call the program
 106 giving it the path to the Bacula executable and the {\bf PID}. In this case,
 107 it is:
 108
 109 \footnotesize
 110 \begin{verbatim}
 111 ./btraceback /home/kern/bacula/k/src/dird 2103
 112 \end{verbatim}
 113 \normalsize
 114
 115 It should produce an email showing you the current state of the daemon (in
 116 this case the Director), and then exit leaving {\bf Bacula} running as if
 117 nothing happened. If this is not the case, you will need to correct the
 118 problem by modifying the {\bf btraceback} script.
 119
 120 Typical problems might be that {\bf gdb} or {\bf dbx} for Solaris is not on
 121 the default path.  Fix this by specifying the full path to it in the {\bf
 122 btraceback} file.  Another common problem is that you haven't modified the
 123 script so that the {\bf bsmtp} program has an appropriate smtp server or
 124 the proper syntax for your smtp server.  If you use the {\bf mail} program
 125 and it is not on the default path, it will also fail.  On some systems, it
 126 is preferable to use {\bf Mail} rather than {\bf mail}.
 127
 128 \section{Getting A Traceback On Other Systems}
 129 \index[general]{Getting A Traceback On Other Systems}
 130 \index[general]{Systems!Getting A Traceback On Other}
 131
 132 It should be possible to produce a similar traceback on systems other than
 133 Linux, either using {\bf gdb} or some other debugger. Solaris with {\bf dbx}
 134 loaded works quite fine. On other systems, you will need to modify the {\bf
 135 btraceback} program to invoke the correct debugger, and possibly correct the
 136 {\bf btraceback.gdb} script to have appropriate commands for your debugger. If
 137 anyone succeeds in making this work with another debugger, please send us a
 138 copy of what you modified. Please keep in mind that for any debugger to
 139 work, it will most likely need to run as root, so you may need to modify
 140 the {\bf btraceback} script accordingly.
 141
 142 \label{ManuallyDebugging}
 143 \section{Manually Running Bacula Under The Debugger}
 144 \index[general]{Manually Running Bacula Under The Debugger}
 145 \index[general]{Debugger!Manually Running Bacula Under The}
 146
 147 If for some reason you cannot get the automatic traceback, or if you want to
 148 interactively examine the variable contents after a crash, you can run Bacula
 149 under the debugger. Assuming you want to run the Storage daemon under the
 150 debugger (the technique is the same for the other daemons, only the name
 151 changes), you would do the following:
 152
 153 \begin{enumerate}
 154 \item Start the Director and the File daemon. If the  Storage daemon also
 155    starts, you will need to find its PID  as shown above (ps fax | grep
 156    bacula-sd) and kill it  with a command like the following:
 157
 158 \footnotesize
 159 \begin{verbatim}
 160       kill -15 PID
 161 \end{verbatim}
 162 \normalsize
 163
 164 where you replace {\bf PID} by the actual value.
 165
 166 \item At this point, the Director and the File daemon should  be running but
 167    the Storage daemon should not.
 168
 169 \item cd to the directory containing the Storage daemon
 170
 171 \item Start the Storage daemon under the debugger:
 172
 173    \footnotesize
 174 \begin{verbatim}
 175     gdb ./bacula-sd
 176 \end{verbatim}
 177 \normalsize
 178
 179 \item Run the Storage daemon:
 180
 181    \footnotesize
 182 \begin{verbatim}
 183      run -s -f -c ./bacula-sd.conf
 184 \end{verbatim}
 185 \normalsize
 186
 187 You may replace the {\bf ./bacula-sd.conf} with the full path  to the Storage
 188 daemon's configuration file.
 189
 190 \item At this point, Bacula will be fully operational.
 191
 192 \item In another shell command window, start the Console program  and do what
 193    is necessary to cause Bacula to die.
 194
 195 \item When Bacula crashes, the {\bf gdb} shell window will  become active and
 196    {\bf gdb} will show you the error that  occurred.
 197
 198 \item To get a general traceback of all threads, issue the following  command:
 199
 200
 201 \footnotesize
 202 \begin{verbatim}
 203        thread apply all bt
 204 \end{verbatim}
 205 \normalsize
 206
 207 After that you can issue any debugging command.
 208 \end{enumerate}
 209
 210 \section{Getting Debug Output from Bacula}
 211 \index[general]{Getting Debug Output from Bacula }
 212 Each of the daemons normally has debug compiled into the program, but
 213 disabled. There are two ways to enable the debug output. One is to add the
 214 {\bf -d nnn} option on the command line when starting the debugger. The {\bf
 215 nnn} is the debug level, and generally anything between 50 and 200 is
 216 reasonable. The higher the number, the more output is produced. The output is
 217 written to standard output.
 218
 219 The second way of getting debug output is to dynamically turn it on using the
 220 Console using the {\bf setdebug} command. The full syntax of the command is:
 221
 222 \footnotesize
 223 \begin{verbatim}
 224  setdebug level=nnn client=client-name storage=storage-name dir
 225 \end{verbatim}
 226 \normalsize
 227
 228 If none of the options are given, the command will prompt you. You can
 229 selectively turn on/off debugging in any or all the daemons (i.e. it is not
 230 necessary to specify all the components of the above command).