git.sur5r.net Git - bacula/docs/blob - docs/manuals/en/concepts/migration.tex

   1
   2 \chapter{Migration and Copy}
   3 \label{MigrationChapter}
   4 \index[general]{Migration}
   5 \index[general]{Copy}
   6
   7 The term Migration, as used in the context of Bacula, means moving data from
   8 one Volume to another.  In particular it refers to a Job (similar to a backup
   9 job) that reads data that was previously backed up to a Volume and writes
  10 it to another Volume.  As part of this process, the File catalog records
  11 associated with the first backup job are purged.  In other words, Migration
  12 moves Bacula Job data from one Volume to another by reading the Job data
  13 from the Volume it is stored on, writing it to a different Volume in a
  14 different Pool, and then purging the database records for the first Job.
  15
  16 The Copy process is essentially identical to the Migration feature
  17 with the exception that the Job that is copied is left unchanged.  This
  18 essentially creates two identical copies of the same backup.  The Copy Job runs
  19 without using the File daemon by copying the data from the old backup Volume to
  20 a different Volume in a different Pool.
  21
  22 The section process for which Job or Jobs are migrated
  23 can be based on quite a number of different criteria such as:
  24 \begin{itemize}
  25 \item a single previous Job
  26 \item a Volume
  27 \item a Client
  28 \item a regular expression matching a Job, Volume, or Client name
  29 \item the time a Job has been on a Volume
  30 \item high and low water marks (usage or occupation) of a Pool
  31 \item Volume size
  32 \end{itemize}
  33
  34 The details of these selection criteria will be defined below.
  35
  36 To run a Migration job, you must first define a Job resource very similar
  37 to a Backup Job but with {\bf Type = Migrate} instead of {\bf Type =
  38 Backup}.  One of the key points to remember is that the Pool that is
  39 specified for the migration job is the only pool from which jobs will
  40 be migrated, with one exception noted below. In addition, the Pool to
  41 which the selected Job or Jobs will be migrated is defined by the {\bf
  42 Next Pool = ...} in the Pool resource specified for the Migration Job.
  43
  44 Bacula permits pools to contain Volumes with different Media Types.
  45 However, when doing migration, this is a very undesirable condition.  For
  46 migration to work properly, you should use pools containing only Volumes of
  47 the same Media Type for all migration jobs.
  48
  49 The migration job normally is either manually started or starts
  50 from a Schedule much like a backup job. It searches
  51 for a previous backup Job or Jobs that match the parameters you have
  52 specified in the migration Job resource, primarily a {\bf Selection Type}
  53 (detailed a bit later).  Then for
  54 each previous backup JobId found, the Migration Job will run a new Job which
  55 copies the old Job data from the previous Volume to a new Volume in
  56 the Migration Pool.  It is possible that no prior Jobs are found for
  57 migration, in which case, the Migration job will simply terminate having
  58 done nothing, but normally at a minimum, three jobs are involved during a
  59 migration:
  60
  61 \begin{itemize}
  62 \item The currently running Migration control Job. This is only
  63       a control job for starting the migration child jobs.
  64 \item The previous Backup Job (already run). The File records
  65       for this Job are purged if the Migration job successfully
  66       terminates.  The original data remains on the Volume until
  67       it is recycled and rewritten.
  68 \item A new Migration Backup Job that moves the data from the
  69       previous Backup job to the new Volume.  If you subsequently
  70       do a restore, the data will be read from this Job.
  71 \end{itemize}
  72
  73 If the Migration control job finds a number of JobIds to migrate (e.g.
  74 it is asked to migrate one or more Volumes), it will start one new
  75 migration backup job for each JobId found on the specified Volumes.
  76 Please note that Migration doesn't scale too well since Migrations are
  77 done on a Job by Job basis. This if you select a very large volume or
  78 a number of volumes for migration, you may have a large number of
  79 Jobs that start. Because each job must read the same Volume, they will
  80 run consecutively (not simultaneously).
  81
  82 \section{Migration and Copy Job Resource Directives}
  83
  84 The following directives can appear in a Director's Job resource, and they
  85 are used to define a Migration job.
  86
  87 \begin{description}
  88 \item [Pool = \lt{}Pool-name\gt{}] The Pool specified in the Migration
  89    control Job is not a new directive for the Job resource, but it is
  90    particularly important because it determines what Pool will be examined for
  91    finding JobIds to migrate.  The exception to this is when {\bf Selection
  92    Type = SQLQuery}, in which case no Pool is used, unless you
  93    specifically include it in the SQL query. Note, the Pool resource
  94    referenced must contain a {\bf Next Pool = ...} directive to define
  95    the Pool to which the data will be migrated.
  96
  97 \item [Type = Migrate]
  98    {\bf Migrate} is a new type that defines the job that is run as being a
  99    Migration Job.  A Migration Job is a sort of control job and does not have
 100    any Files associated with it, and in that sense they are more or less like
 101     an Admin job.  Migration jobs simply check to see if there is anything to
 102    Migrate then possibly start and control new Backup jobs to migrate the data
 103    from the specified Pool to another Pool.
 104
 105 \item [Type = Copy]
 106    {\bf Copy} is a new type that defines the job that is run as being a
 107    Copy Job.  A Copy Job is a sort of control job and does not have
 108    any Files associated with it, and in that sense they are more or less like
 109     an Admin job.  Copy jobs simply check to see if there is anything to
 110    Copy then possibly start and control new Backup jobs to copy the data
 111    from the specified Pool to another Pool.
 112
 113 \item [Selection Type = \lt{}Selection-type-keyword\gt{}]
 114   The \lt{}Selection-type-keyword\gt{} determines how the migration job
 115   will go about selecting what JobIds to migrate. In most cases, it is
 116   used in conjunction with a {\bf Selection Pattern} to give you fine
 117   control over exactly what JobIds are selected.  The possible values
 118   for \lt{}Selection-type-keyword\gt{} are:
 119   \begin{description}
 120   \item [SmallestVolume] This selection keyword selects the volume with the
 121         fewest bytes from the Pool to be migrated.  The Pool to be migrated
 122         is the Pool defined in the Migration Job resource.  The migration
 123         control job will then start and run one migration backup job for
 124         each of the Jobs found on this Volume.  The Selection Pattern, if
 125         specified, is not used.
 126
 127   \item [OldestVolume] This selection keyword selects the volume with the
 128         oldest last write time in the Pool to be migrated.  The Pool to be
 129         migrated is the Pool defined in the Migration Job resource.  The
 130         migration control job will then start and run one migration backup
 131         job for each of the Jobs found on this Volume.  The Selection
 132         Pattern, if specified, is not used.
 133
 134   \item [Client] The Client selection type, first selects all the Clients
 135         that have been backed up in the Pool specified by the Migration
 136         Job resource, then it applies the {\bf Selection Pattern} (defined
 137         below) as a regular expression to the list of Client names, giving
 138         a filtered Client name list.  All jobs that were backed up for those
 139         filtered (regexed) Clients will be migrated.
 140         The migration control job will then start and run one migration
 141         backup job for each of the JobIds found for those filtered Clients.
 142
 143   \item [Volume] The Volume selection type, first selects all the Volumes
 144         that have been backed up in the Pool specified by the Migration
 145         Job resource, then it applies the {\bf Selection Pattern} (defined
 146         below) as a regular expression to the list of Volume names, giving
 147         a filtered Volume list.  All JobIds that were backed up for those
 148         filtered (regexed) Volumes will be migrated.
 149         The migration control job will then start and run one migration
 150         backup job for each of the JobIds found on those filtered Volumes.
 151
 152   \item [Job] The Job selection type, first selects all the Jobs (as
 153         defined on the {\bf Name} directive in a Job resource)
 154         that have been backed up in the Pool specified by the Migration
 155         Job resource, then it applies the {\bf Selection Pattern} (defined
 156         below) as a regular expression to the list of Job names, giving
 157         a filtered Job name list.  All JobIds that were run for those
 158         filtered (regexed) Job names will be migrated.  Note, for a given
 159         Job named, they can be many jobs (JobIds) that ran.
 160         The migration control job will then start and run one migration
 161         backup job for each of the Jobs found.
 162
 163   \item [SQLQuery] The SQLQuery selection type, used the {\bf Selection
 164         Pattern} as an SQL query to obtain the JobIds to be migrated.
 165         The Selection Pattern must be a valid SELECT SQL statement for your
 166         SQL engine, and it must return the JobId as the first field
 167         of the SELECT.
 168
 169   \item [PoolOccupancy] This selection type will cause the Migration job
 170         to compute the total size of the specified pool for all Media Types
 171         combined. If it exceeds the {\bf Migration High Bytes} defined in
 172         the Pool, the Migration job will migrate all JobIds beginning with
 173         the oldest Volume in the pool (determined by Last Write time) until
 174         the Pool bytes drop below the {\bf Migration Low Bytes} defined in the
 175         Pool. This calculation should be consider rather approximative because
 176         it is made once by the Migration job before migration is begun, and
 177         thus does not take into account additional data written into the Pool
 178         during the migration.  In addition, the calculation of the total Pool
 179         byte size is based on the Volume bytes saved in the Volume (Media)
 180 database
 181         entries. The bytes calculate for Migration is based on the value stored
 182         in the Job records of the Jobs to be migrated. These do not include the
 183         Storage daemon overhead as is in the total Pool size. As a consequence,
 184         normally, the migration will migrate more bytes than strictly necessary.
 185
 186   \item [PoolTime] The PoolTime selection type will cause the Migration job to
 187         look at the time each JobId has been in the Pool since the job ended.
 188         All Jobs in the Pool longer than the time specified on {\bf Migration Time}
 189         directive in the Pool resource will be migrated.
 190   \end{description}
 191
 192 \item [Selection Pattern = \lt{}Quoted-string\gt{}]
 193   The Selection Patterns permitted for each Selection-type-keyword are
 194   described above.
 195
 196   For the OldestVolume and SmallestVolume, this
 197   Selection pattern is not used (ignored).
 198
 199   For the Client, Volume, and Job
 200   keywords, this pattern must be a valid regular expression that will filter
 201   the appropriate item names found in the Pool.
 202
 203   For the SQLQuery keyword, this pattern must be a valid SELECT SQL statement
 204   that returns JobIds.
 205
 206 \end{description}
 207
 208 \section{Migration Pool Resource Directives}
 209
 210 The following directives can appear in a Director's Pool resource, and they
 211 are used to define a Migration job.
 212
 213 \begin{description}
 214 \item [Migration Time = \lt{}time-specification\gt{}]
 215    If a PoolTime migration is done, the time specified here in seconds (time
 216    modifiers are permitted -- e.g. hours, ...) will be used. If the
 217    previous Backup Job or Jobs selected have been in the Pool longer than
 218    the specified PoolTime, then they will be migrated.
 219
 220 \item [Migration High Bytes =  \lt{}byte-specification\gt{}]
 221    This directive specifies the number of bytes in the Pool which will
 222    trigger a migration if a {\bf PoolOccupancy} migration selection
 223    type has been specified. The fact that the Pool
 224    usage goes above this level does not automatically trigger a migration
 225    job. However, if a migration job runs and has the PoolOccupancy selection
 226    type set, the Migration High Bytes will be applied.  Bacula does not
 227    currently restrict a pool to have only a single Media Type, so you
 228    must keep in mind that if you mix Media Types in a Pool, the results
 229    may not be what you want, as the Pool count of all bytes will be
 230    for all Media Types combined.
 231
 232 \item [Migration Low Bytes = \lt{}byte-specification\gt{}]
 233    This directive specifies the number of bytes in the Pool which will
 234    stop a migration if a {\bf PoolOccupancy} migration selection
 235    type has been specified and triggered by more than Migration High
 236    Bytes being in the pool. In other words, once a migration job
 237    is started with {\bf PoolOccupancy} migration selection and it
 238    determines that there are more than Migration High Bytes, the
 239    migration job will continue to run jobs until the number of
 240    bytes in the Pool drop to or below Migration Low Bytes.
 241
 242 \item [Next Pool = \lt{}pool-specification\gt{}]
 243    The Next Pool directive specifies the pool to which Jobs will be
 244    migrated. This directive is required to define the Pool into which
 245    the data will be migrated. Without this directive, the migration job
 246    will terminate in error.
 247
 248 \item [Storage = \lt{}storage-specification\gt{}]
 249    The Storage directive specifies what Storage resource will be used
 250    for all Jobs that use this Pool. It takes precedence over any other
 251    Storage specifications that may have been given such as in the
 252    Schedule Run directive, or in the Job resource.  We highly recommend
 253    that you define the Storage resource to be used in the Pool rather
 254    than elsewhere (job, schedule run, ...).
 255 \end{description}
 256
 257 \section{Important Migration Considerations}
 258 \index[general]{Important Migration Considerations}
 259 \begin{itemize}
 260 \item Each Pool into which you migrate Jobs or Volumes {\bf must}
 261       contain Volumes of only one Media Type.
 262
 263 \item Migration takes place on a JobId by JobId basis. That is
 264       each JobId is migrated in its entirety and independently
 265       of other JobIds. Once the Job is migrated, it will be
 266       on the new medium in the new Pool, but for the most part,
 267       aside from having a new JobId, it will appear with all the
 268       same characteristics of the original job (start, end time, ...).
 269       The column RealEndTime in the catalog Job table will contain the
 270       time and date that the Migration terminated, and by comparing
 271       it with the EndTime column you can tell whether or not the
 272       job was migrated.  The original job is purged of its File
 273       records, and its Type field is changed from "B" to "M" to
 274       indicate that the job was migrated.
 275
 276 \item Jobs on Volumes will be Migration only if the Volume is
 277       marked, Full, Used, or Error.  Volumes that are still
 278       marked Append will not be considered for migration. This
 279       prevents Bacula from attempting to read the Volume at
 280       the same time it is writing it. It also reduces other deadlock
 281       situations, as well as avoids the problem that you migrate a
 282       Volume and later find new files appended to that Volume.
 283
 284 \item As noted above, for the Migration High Bytes, the calculation
 285       of the bytes to migrate is somewhat approximate.
 286
 287 \item If you keep Volumes of different Media Types in the same Pool,
 288       it is not clear how well migration will work.  We recommend only
 289       one Media Type per pool.
 290
 291 \item It is possible to get into a resource deadlock where Bacula does
 292       not find enough drives to simultaneously read and write all the
 293       Volumes needed to do Migrations. For the moment, you must take
 294       care as all the resource deadlock algorithms are not yet implemented.
 295
 296 \item Migration is done only when you run a Migration job. If you set a
 297       Migration High Bytes and that number of bytes is exceeded in the Pool
 298       no migration job will automatically start.  You must schedule the
 299       migration jobs, and they must run for any migration to take place.
 300
 301 \item If you migrate a number of Volumes, a very large number of Migration
 302       jobs may start.
 303
 304 \item Figuring out what jobs will actually be migrated can be a bit complicated
 305       due to the flexibility provided by the regex patterns and the number of
 306       different options.  Turning on a debug level of 100 or more will provide
 307       a limited amount of debug information about the migration selection
 308       process.
 309
 310 \item Bacula currently does only minimal Storage conflict resolution, so you
 311       must take care to ensure that you don't try to read and write to the
 312       same device or Bacula may block waiting to reserve a drive that it
 313       will never find. In general, ensure that all your migration
 314       pools contain only one Media Type, and that you always
 315       migrate to pools with different Media Types.
 316
 317 \item The {\bf Next Pool = ...} directive must be defined in the Pool
 318      referenced in the Migration Job to define the Pool into which the
 319      data will be migrated.
 320
 321 \item Pay particular attention to the fact that data is migrated on a Job
 322      by Job basis, and for any particular Volume, only one Job can read
 323      that Volume at a time (no simultaneous read), so migration jobs that
 324      all reference the same Volume will run sequentially.  This can be a
 325      potential bottle neck and does not scale very well to large numbers
 326      of jobs.
 327
 328 \item Only migration of Selection Types of Job and Volume have
 329      been carefully tested. All the other migration methods (time,
 330      occupancy, smallest, oldest, ...) need additional testing.
 331
 332 \item Migration is only implemented for a single Storage daemon.  You
 333      cannot read on one Storage daemon and write on another.
 334 \end{itemize}
 335
 336
 337 \section{Example Migration Jobs}
 338 \index[general]{Example Migration Jobs}
 339
 340 When you specify a Migration Job, you must specify all the standard
 341 directives as for a Job.  However, certain such as the Level, Client, and
 342 FileSet, though they must be defined, are ignored by the Migration job
 343 because the values from the original job used instead.
 344
 345 As an example, suppose you have the following Job that
 346 you run every night. To note: there is no Storage directive in the
 347 Job resource; there is a Storage directive in each of the Pool
 348 resources; the Pool to be migrated (File) contains a Next Pool
 349 directive that defines the output Pool (where the data is written
 350 by the migration job).
 351
 352 \footnotesize
 353 \begin{verbatim}
 354 # Define the backup Job
 355 Job {
 356   Name = "NightlySave"
 357   Type = Backup
 358   Level = Incremental                 # default
 359   Client=rufus-fd
 360   FileSet="Full Set"
 361   Schedule = "WeeklyCycle"
 362   Messages = Standard
 363   Pool = Default
 364 }
 365
 366 # Default pool definition
 367 Pool {
 368   Name = Default
 369   Pool Type = Backup
 370   AutoPrune = yes
 371   Recycle = yes
 372   Next Pool = Tape
 373   Storage = File
 374   LabelFormat = "File"
 375 }
 376
 377 # Tape pool definition
 378 Pool {
 379   Name = Tape
 380   Pool Type = Backup
 381   AutoPrune = yes
 382   Recycle = yes
 383   Storage = DLTDrive
 384 }
 385
 386 # Definition of File storage device
 387 Storage {
 388   Name = File
 389   Address = rufus
 390   Password = "ccV3lVTsQRsdIUGyab0N4sMDavui2hOBkmpBU0aQKOr9"
 391   Device = "File"          # same as Device in Storage daemon
 392   Media Type = File        # same as MediaType in Storage daemon
 393 }
 394
 395 # Definition of DLT tape storage device
 396 Storage {
 397   Name = DLTDrive
 398   Address = rufus
 399   Password = "ccV3lVTsQRsdIUGyab0N4sMDavui2hOBkmpBU0aQKOr9"
 400   Device = "HP DLT 80"      # same as Device in Storage daemon
 401   Media Type = DLT8000      # same as MediaType in Storage daemon
 402 }
 403
 404 \end{verbatim}
 405 \normalsize
 406
 407 Where we have included only the essential information -- i.e. the
 408 Director, FileSet, Catalog, Client, Schedule, and Messages resources are
 409 omitted.
 410
 411 As you can see, by running the NightlySave Job, the data will be backed up
 412 to File storage using the Default pool to specify the Storage as File.
 413
 414 Now, if we add the following Job resource to this conf file.
 415
 416 \footnotesize
 417 \begin{verbatim}
 418 Job {
 419   Name = "migrate-volume"
 420   Type = Migrate
 421   Level = Full
 422   Client = rufus-fd
 423   FileSet = "Full Set"
 424   Messages = Standard
 425   Pool = Default
 426   Maximum Concurrent Jobs = 4
 427   Selection Type = Volume
 428   Selection Pattern = "File"
 429 }
 430 \end{verbatim}
 431 \normalsize
 432
 433 and then run the job named {\bf migrate-volume}, all volumes in the Pool
 434 named Default (as specified in the migrate-volume Job that match the
 435 regular expression pattern {\bf File} will be migrated to tape storage
 436 DLTDrive because the {\bf Next Pool} in the Default Pool specifies that
 437 Migrations should go to the pool named {\bf Tape}, which uses
 438 Storage {\bf DLTDrive}.
 439
 440 If instead, we use a Job resource as follows:
 441
 442 \footnotesize
 443 \begin{verbatim}
 444 Job {
 445   Name = "migrate"
 446   Type = Migrate
 447   Level = Full
 448   Client = rufus-fd
 449   FileSet="Full Set"
 450   Messages = Standard
 451   Pool = Default
 452   Maximum Concurrent Jobs = 4
 453   Selection Type = Job
 454   Selection Pattern = ".*Save"
 455 }
 456 \end{verbatim}
 457 \normalsize
 458
 459 All jobs ending with the name Save will be migrated from the File Default to
 460 the Tape Pool, or from File storage to Tape storage.