git.sur5r.net Git - bacula/docs/blob - docs/manual/migration.tex

   1
   2 \chapter{Migration}
   3 \label{MigrationChapter}
   4 \index[general]{Migration}
   5
   6 The term Migration, as used in the context of Bacula, means moving data from
   7 one Volume to another.  In particular it refers to a Job (similar to a backup
   8 job) that reads data that was previously backed up to a Volume and writes
   9 it to another Volume.  As part of this process, the File catalog records
  10 associated with the first backup job are purged.  In other words, Migration
  11 moves Bacula Job data from one Volume to another by reading the Job data
  12 from the Volume it is stored on, writing it to a different Volume in a
  13 different Pool, and then purging the database records for the first Job.
  14
  15 The section process for which Job or Jobs are migrated
  16 can be based on quite a number of different criteria such as:
  17 \begin{itemize}
  18 \item a single previous Job
  19 \item a Volume
  20 \item a Client
  21 \item a regular expression matching a Job, Volume, or Client name
  22 \item the time a Job has been on a Volume
  23 \item high and low water marks (usage or occupation) of a Pool
  24 \item Volume size
  25 \end{itemize}
  26
  27 The details of these selection criteria will be defined below.
  28
  29 To run a Migration job, you must first define a Job resource very similar
  30 to a Backup Job but with {\bf Type = Migrate} instead of {\bf Type =
  31 Backup}.  One of the key points to remember is that the Pool that is
  32 specified for the migration job is the only pool from which jobs will
  33 be migrated, with one exception noted below. In addition, the Pool to
  34 which the selected Job or Jobs will be migrated is defined by the {\bf
  35 Next Pool = ...} in the Pool resource specified for the Migration Job.
  36
  37 Bacula permits pools to contain Volumes with different Media Types.
  38 However, when doing migration, this is a very undesirable condition.  For
  39 migration to work properly, you should use pools containing only Volumes of
  40 the same Media Type for all migration jobs.
  41
  42 The migration job normally is either manually started or starts
  43 from a Schedule much like a backup job. It searches
  44 for a previous backup Job or Jobs that match the parameters you have
  45 specified in the migration Job resource, primarily a {\bf Selection Type}
  46 (detailed a bit later).  Then for
  47 each previous backup JobId found, the Migration Job will run a new Job which
  48 copies the old Job data from the previous Volume to a new Volume in
  49 the Migration Pool.  It is possible that no prior Jobs are found for
  50 migration, in which case, the Migration job will simply terminate having
  51 done nothing, but normally at a minimum, three jobs are involved during a
  52 migration:
  53
  54 \begin{itemize}
  55 \item The currently running Migration control Job. This is only
  56       a control job for starting the migration child jobs.
  57 \item The previous Backup Job (already run). The File records
  58       for this Job are purged if the Migration job successfully
  59       terminates.  The original data remains on the Volume until
  60       it is recycled and rewritten.
  61 \item A new Migration Backup Job that moves the data from the
  62       previous Backup job to the new Volume.  If you subsequently
  63       do a restore, the data will be read from this Job.
  64 \end{itemize}
  65
  66 If the Migration control job finds a number of JobIds to migrate (e.g.
  67 it is asked to migrate one or more Volumes), it will start one new
  68 migration backup job for each JobId found on the specified Volumes.
  69 Please note that Migration doesn't scale too well since Migrations are
  70 done on a Job by Job basis. This if you select a very large volume or
  71 a number of volumes for migration, you may have a large number of
  72 Jobs that start. Because each job must read the same Volume, they will
  73 run consecutively (not simultaneously).
  74
  75 \section{Migration Job Resource Directives}
  76
  77 The following directives can appear in a Director's Job resource, and they
  78 are used to define a Migration job.
  79
  80 \begin{description}
  81 \item [Pool = \lt{}Pool-name\gt{}] The Pool specified in the Migration
  82    control Job is not a new directive for the Job resource, but it is
  83    particularly important because it determines what Pool will be examined for
  84    finding JobIds to migrate.  The exception to this is when {\bf Selection
  85    Type = SQLQuery}, in which case no Pool is used, unless you
  86    specifically include it in the SQL query. Note, the Pool resource
  87    referenced must contain a {\bf Next Pool = ...} directive to define
  88    the Pool to which the data will be migrated.
  89
  90 \item [Type = Migrate]
  91    {\bf Migrate} is a new type that defines the job that is run as being a
  92    Migration Job.  A Migration Job is a sort of control job and does not have
  93    any Files associated with it, and in that sense they are more or less like
  94     an Admin job.  Migration jobs simply check to see if there is anything to
  95    Migrate then possibly start and control new Backup jobs to migrate the data
  96    from the specified Pool to another Pool.
  97
  98 \item [Selection Type = \lt{}Selection-type-keyword\gt{}]
  99   The \lt{}Selection-type-keyword\gt{} determines how the migration job
 100   will go about selecting what JobIds to migrate. In most cases, it is
 101   used in conjunction with a {\bf Selection Pattern} to give you fine
 102   control over exactly what JobIds are selected.  The possible values
 103   for \lt{}Selection-type-keyword\gt{} are:
 104   \begin{description}
 105   \item [SmallestVolume] This selection keyword selects the volume with the
 106         fewest bytes from the Pool to be migrated.  The Pool to be migrated
 107         is the Pool defined in the Migration Job resource.  The migration
 108         control job will then start and run one migration backup job for
 109         each of the Jobs found on this Volume.  The Selection Pattern, if
 110         specified, is not used.
 111
 112   \item [OldestVolume] This selection keyword selects the volume with the
 113         oldest last write time in the Pool to be migrated.  The Pool to be
 114         migrated is the Pool defined in the Migration Job resource.  The
 115         migration control job will then start and run one migration backup
 116         job for each of the Jobs found on this Volume.  The Selection
 117         Pattern, if specified, is not used.
 118
 119   \item [Client] The Client selection type, first selects all the Clients
 120         that have been backed up in the Pool specified by the Migration
 121         Job resource, then it applies the {\bf Selection Pattern} (defined
 122         below) as a regular expression to the list of Client names, giving
 123         a filtered Client name list.  All jobs that were backed up for those
 124         filtered (regexed) Clients will be migrated.
 125         The migration control job will then start and run one migration
 126         backup job for each of the JobIds found for those filtered Clients.
 127
 128   \item [Volume] The Volume selection type, first selects all the Volumes
 129         that have been backed up in the Pool specified by the Migration
 130         Job resource, then it applies the {\bf Selection Pattern} (defined
 131         below) as a regular expression to the list of Volume names, giving
 132         a filtered Volume list.  All JobIds that were backed up for those
 133         filtered (regexed) Volumes will be migrated.
 134         The migration control job will then start and run one migration
 135         backup job for each of the JobIds found on those filtered Volumes.
 136
 137   \item [Job] The Job selection type, first selects all the Jobs (as
 138         defined on the {\bf Name} directive in a Job resource)
 139         that have been backed up in the Pool specified by the Migration
 140         Job resource, then it applies the {\bf Selection Pattern} (defined
 141         below) as a regular expression to the list of Job names, giving
 142         a filtered Job name list.  All JobIds that were run for those
 143         filtered (regexed) Job names will be migrated.  Note, for a given
 144         Job named, they can be many jobs (JobIds) that ran.
 145         The migration control job will then start and run one migration
 146         backup job for each of the Jobs found.
 147
 148   \item [SQLQuery] The SQLQuery selection type, used the {\bf Selection
 149         Pattern} as an SQL query to obtain the JobIds to be migrated.
 150         The Selection Pattern must be a valid SELECT SQL statement for your
 151         SQL engine, and it must return the JobId as the first field
 152         of the SELECT.
 153
 154   \item [PoolOccupancy] This selection type will cause the Migration job
 155         to compute the total size of the specified pool for all Media Types
 156         combined. If it exceeds the {\bf Migration High Bytes} defined in
 157         the Pool, the Migration job will migrate all JobIds beginning with
 158         the oldest Volume in the pool (determined by Last Write time) until
 159         the Pool bytes drop below the {\bf Migration Low Bytes} defined in the
 160         Pool. This calculation should be consider rather approximative because
 161         it is made once by the Migration job before migration is begun, and
 162         thus does not take into account additional data written into the Pool
 163         during the migration.  In addition, the calculation of the total Pool
 164         byte size is based on the Volume bytes saved in the Volume (Media)
 165 database
 166         entries. The bytes calculate for Migration is based on the value stored
 167         in the Job records of the Jobs to be migrated. These do not include the
 168         Storage daemon overhead as is in the total Pool size. As a consequence,
 169         normally, the migration will migrate more bytes than strictly necessary.
 170
 171   \item [PoolTime] The PoolTime selection type will cause the Migration job to
 172         look at the time each JobId has been in the Pool since the job ended.
 173         All Jobs in the Pool longer than the time specified on {\bf Migration Time}
 174         directive in the Pool resource will be migrated.
 175   \end{description}
 176
 177 \item [Selection Pattern = \lt{}Quoted-string\gt{}]
 178   The Selection Patterns permitted for each Selection-type-keyword are
 179   described above.
 180
 181   For the OldestVolume and SmallestVolume, this
 182   Selection pattern is not used (ignored).
 183
 184   For the Client, Volume, and Job
 185   keywords, this pattern must be a valid regular expression that will filter
 186   the appropriate item names found in the Pool.
 187
 188   For the SQLQuery keyword, this pattern must be a valid SELECT SQL statement
 189   that returns JobIds.
 190
 191 \end{description}
 192
 193 \section{Migration Pool Resource Directives}
 194
 195 The following directives can appear in a Director's Pool resource, and they
 196 are used to define a Migration job.
 197
 198 \begin{description}
 199 \item [Migration Time = \lt{}time-specification\gt{}]
 200    If a PoolTime migration is done, the time specified here in seconds (time
 201    modifiers are permitted -- e.g. hours, ...) will be used. If the
 202    previous Backup Job or Jobs selected have been in the Pool longer than
 203    the specified PoolTime, then they will be migrated.
 204
 205 \item [Migration High Bytes =  \lt{}byte-specification\gt{}]
 206    This directive specifies the number of bytes in the Pool which will
 207    trigger a migration if a {\bf PoolOccupancy} migration selection
 208    type has been specified. The fact that the Pool
 209    usage goes above this level does not automatically trigger a migration
 210    job. However, if a migration job runs and has the PoolOccupancy selection
 211    type set, the Migration High Bytes will be applied.  Bacula does not
 212    currently restrict a pool to have only a single Media Type, so you
 213    must keep in mind that if you mix Media Types in a Pool, the results
 214    may not be what you want, as the Pool count of all bytes will be
 215    for all Media Types combined.
 216
 217 \item [Migration Low Bytes = \lt{}byte-specification\gt{}]
 218    This directive specifies the number of bytes in the Pool which will
 219    stop a migration if a {\bf PoolOccupancy} migration selection
 220    type has been specified and triggered by more than Migration High
 221    Bytes being in the pool. In other words, once a migration job
 222    is started with {\bf PoolOccupancy} migration selection and it
 223    determines that there are more than Migration High Bytes, the
 224    migration job will continue to run jobs until the number of
 225    bytes in the Pool drop to or below Migration Low Bytes.
 226
 227 \item [Next Pool = \lt{}pool-specification\gt{}]
 228    The Next Pool directive specifies the pool to which Jobs will be
 229    migrated. This directive is required to define the Pool into which
 230    the data will be migrated. Without this directive, the migration job
 231    will terminate in error.
 232
 233 \item [Storage = \lt{}storage-specification\gt{}]
 234    The Storage directive specifies what Storage resource will be used
 235    for all Jobs that use this Pool. It takes precedence over any other
 236    Storage specifications that may have been given such as in the
 237    Schedule Run directive, or in the Job resource.  We highly recommend
 238    that you define the Storage resource to be used in the Pool rather
 239    than elsewhere (job, schedule run, ...).
 240 \end{description}
 241
 242 \section{Important Migration Considerations}
 243 \index[general]{Important Migration Considerations}
 244 \begin{itemize}
 245 \item Each Pool into which you migrate Jobs or Volumes {\bf must}
 246       contain Volumes of only one Media Type.
 247
 248 \item Migration takes place on a JobId by JobId basis. That is
 249       each JobId is migrated in its entirety and independently
 250       of other JobIds. Once the Job is migrated, it will be
 251       on the new medium in the new Pool, but for the most part,
 252       aside from having a new JobId, it will appear with all the
 253       same characteristics of the original job (start, end time, ...).
 254       The column RealEndTime in the catalog Job table will contain the
 255       time and date that the Migration terminated, and by comparing
 256       it with the EndTime column you can tell whether or not the
 257       job was migrated.  The original job is purged of its File
 258       records, and its Type field is changed from "B" to "M" to
 259       indicate that the job was migrated.
 260
 261 \item Jobs on Volumes will be Migration only if the Volume is
 262       marked, Full, Used, or Error.  Volumes that are still
 263       marked Append will not be considered for migration. This
 264       prevents Bacula from attempting to read the Volume at
 265       the same time it is writing it. It also reduces other deadlock
 266       situations, as well as avoids the problem that you migrate a
 267       Volume and later find new files appended to that Volume.
 268
 269 \item As noted above, for the Migration High Bytes, the calculation
 270       of the bytes to migrate is somewhat approximate.
 271
 272 \item If you keep Volumes of different Media Types in the same Pool,
 273       it is not clear how well migration will work.  We recommend only
 274       one Media Type per pool.
 275
 276 \item It is possible to get into a resource deadlock where Bacula does
 277       not find enough drives to simultaneously read and write all the
 278       Volumes needed to do Migrations. For the moment, you must take
 279       care as all the resource deadlock algorithms are not yet implemented.
 280
 281 \item Migration is done only when you run a Migration job. If you set a
 282       Migration High Bytes and that number of bytes is exceeded in the Pool
 283       no migration job will automatically start.  You must schedule the
 284       migration jobs, and they must run for any migration to take place.
 285
 286 \item If you migrate a number of Volumes, a very large number of Migration
 287       jobs may start.
 288
 289 \item Figuring out what jobs will actually be migrated can be a bit complicated
 290       due to the flexibility provided by the regex patterns and the number of
 291       different options.  Turning on a debug level of 100 or more will provide
 292       a limited amount of debug information about the migration selection
 293       process.
 294
 295 \item Bacula currently does only minimal Storage conflict resolution, so you
 296       must take care to ensure that you don't try to read and write to the
 297       same device or Bacula may block waiting to reserve a drive that it
 298       will never find. In general, ensure that all your migration
 299       pools contain only one Media Type, and that you always
 300       migrate to pools with different Media Types.
 301
 302 \item The {\bf Next Pool = ...} directive must be defined in the Pool
 303      referenced in the Migration Job to define the Pool into which the
 304      data will be migrated.
 305
 306 \item Pay particular attention to the fact that data is migrated on a Job
 307      by Job basis, and for any particular Volume, only one Job can read
 308      that Volume at a time (no simultaneous read), so migration jobs that
 309      all reference the same Volume will run sequentially.  This can be a
 310      potential bottle neck and does not scale very well to large numbers
 311      of jobs.
 312
 313 \item Only migration of Selection Types of Job and Volume have
 314      been carefully tested. All the other migration methods (time,
 315      occupancy, smallest, oldest, ...) need additional testing.
 316 \end{itemize}
 317
 318
 319 \section{Example Migration Jobs}
 320 \index[general]{Example Migration Jobs}
 321
 322 When you specify a Migration Job, you must specify all the standard
 323 directives as for a Job.  However, certain such as the Level, Client, and
 324 FileSet, though they must be defined, are ignored by the Migration job
 325 because the values from the original job used instead.
 326
 327 As an example, suppose you have the following Job that
 328 you run every night. To note: there is no Storage directive in the
 329 Job resource; there is a Storage directive in each of the Pool
 330 resources; the Pool to be migrated (File) contains a Next Pool
 331 directive that defines the output Pool (where the data is written
 332 by the migration job).
 333
 334 \footnotesize
 335 \begin{verbatim}
 336 # Define the backup Job
 337 Job {
 338   Name = "NightlySave"
 339   Type = Backup
 340   Level = Incremental                 # default
 341   Client=rufus-fd
 342   FileSet="Full Set"
 343   Schedule = "WeeklyCycle"
 344   Messages = Standard
 345   Pool = Default
 346 }
 347
 348 # Default pool definition
 349 Pool {
 350   Name = Default
 351   Pool Type = Backup
 352   AutoPrune = yes
 353   Recycle = yes
 354   Next Pool = Tape
 355   Storage = File
 356   LabelFormat = "File"
 357 }
 358
 359 # Tape pool definition
 360 Pool {
 361   Name = Tape
 362   Pool Type = Backup
 363   AutoPrune = yes
 364   Recycle = yes
 365   Storage = DLTDrive
 366 }
 367
 368 # Definition of File storage device
 369 Storage {
 370   Name = File
 371   Address = rufus
 372   Password = "ccV3lVTsQRsdIUGyab0N4sMDavui2hOBkmpBU0aQKOr9"
 373   Device = "File"          # same as Device in Storage daemon
 374   Media Type = File        # same as MediaType in Storage daemon
 375 }
 376
 377 # Definition of DLT tape storage device
 378 Storage {
 379   Name = DLTDrive
 380   Address = rufus
 381   Password = "ccV3lVTsQRsdIUGyab0N4sMDavui2hOBkmpBU0aQKOr9"
 382   Device = "HP DLT 80"      # same as Device in Storage daemon
 383   Media Type = DLT8000      # same as MediaType in Storage daemon
 384 }
 385
 386 \end{verbatim}
 387 \normalsize
 388
 389 Where we have included only the essential information -- i.e. the
 390 Director, FileSet, Catalog, Client, Schedule, and Messages resources are
 391 omitted.
 392
 393 As you can see, by running the NightlySave Job, the data will be backed up
 394 to File storage using the Default pool to specify the Storage as File.
 395
 396 Now, if we add the following Job resource to this conf file.
 397
 398 \footnotesize
 399 \begin{verbatim}
 400 Job {
 401   Name = "migrate-volume"
 402   Type = Migrate
 403   Level = Full
 404   Client = rufus-fd
 405   FileSet = "Full Set"
 406   Messages = Standard
 407   Pool = Default
 408   Maximum Concurrent Jobs = 4
 409   Selection Type = Volume
 410   Selection Pattern = "File"
 411 }
 412 \end{verbatim}
 413 \normalsize
 414
 415 and then run the job named {\bf migrate-volume}, all volumes in the Pool
 416 named Default (as specified in the migrate-volume Job that match the
 417 regular expression pattern {\bf File} will be migrated to tape storage
 418 DLTDrive because the {\bf Next Pool} in the Default Pool specifies that
 419 Migrations should go to the pool named {\bf Tape}, which uses
 420 Storage {\bf DLTDrive}.
 421
 422 If instead, we use a Job resource as follows:
 423
 424 \footnotesize
 425 \begin{verbatim}
 426 Job {
 427   Name = "migrate"
 428   Type = Migrate
 429   Level = Full
 430   Client = rufus-fd
 431   FileSet="Full Set"
 432   Messages = Standard
 433   Pool = Default
 434   Maximum Concurrent Jobs = 4
 435   Selection Type = Job
 436   Selection Pattern = ".*Save"
 437 }
 438 \end{verbatim}
 439 \normalsize
 440
 441 All jobs ending with the name Save will be migrated from the File Default to
 442 the Tape Pool, or from File storage to Tape storage.