git.sur5r.net Git - bacula/docs/blob - docs/manuals/en/main/migration.tex

   1
   2 \chapter{Migration and Copy}
   3 \label{MigrationChapter}
   4 \index[general]{Migration}
   5 \index[general]{Copy}
   6
   7 The term Migration, as used in the context of Bacula, means moving data from
   8 one Volume to another.  In particular it refers to a Job (similar to a backup
   9 job) that reads data that was previously backed up to a Volume and writes
  10 it to another Volume.  As part of this process, the File catalog records
  11 associated with the first backup job are purged.  In other words, Migration
  12 moves Bacula Job data from one Volume to another by reading the Job data
  13 from the Volume it is stored on, writing it to a different Volume in a
  14 different Pool, and then purging the database records for the first Job.
  15
  16 The Copy process is essentially identical to the Migration feature with the
  17 exception that the Job that is copied is left unchanged.  This essentially
  18 creates two identical copies of the same backup. However, the copy is treated
  19 as a copy rather than a backup job, and hence is not directly available for
  20 restore. If bacula founds a copy when a job record is purged (deleted) from the
  21 catalog, it will promote the copy as \textsl{real} backup and will make it
  22 available for automatic restore.
  23
  24 The Copy and the Migration jobs run without using the File daemon by copying
  25 the data from the old backup Volume to a different Volume in a different Pool.
  26
  27 The selection process for which Job or Jobs are migrated
  28 can be based on quite a number of different criteria such as:
  29 \begin{itemize}
  30 \item a single previous Job
  31 \item a Volume
  32 \item a Client
  33 \item a regular expression matching a Job, Volume, or Client name
  34 \item the time a Job has been on a Volume
  35 \item high and low water marks (usage or occupation) of a Pool
  36 \item Volume size
  37 \end{itemize}
  38
  39 The details of these selection criteria will be defined below.
  40
  41 To run a Migration job, you must first define a Job resource very similar
  42 to a Backup Job but with {\bf Type = Migrate} instead of {\bf Type =
  43 Backup}.  One of the key points to remember is that the Pool that is
  44 specified for the migration job is the only pool from which jobs will
  45 be migrated, with one exception noted below. In addition, the Pool to
  46 which the selected Job or Jobs will be migrated is defined by the {\bf
  47 Next Pool = ...} in the Pool resource specified for the Migration Job.
  48
  49 Bacula permits Pools to contain Volumes with different Media Types.
  50 However, when doing migration, this is a very undesirable condition.  For
  51 migration to work properly, you should use Pools containing only Volumes of
  52 the same Media Type for all migration jobs.
  53
  54 The migration job normally is either manually started or starts
  55 from a Schedule much like a backup job. It searches
  56 for a previous backup Job or Jobs that match the parameters you have
  57 specified in the migration Job resource, primarily a {\bf Selection Type}
  58 (detailed a bit later).  Then for
  59 each previous backup JobId found, the Migration Job will run a new Job which
  60 copies the old Job data from the previous Volume to a new Volume in
  61 the Migration Pool.  It is possible that no prior Jobs are found for
  62 migration, in which case, the Migration job will simply terminate having
  63 done nothing, but normally at a minimum, three jobs are involved during a
  64 migration:
  65
  66 \begin{itemize}
  67 \item The currently running Migration control Job. This is only
  68       a control job for starting the migration child jobs.
  69 \item The previous Backup Job (already run). The File records
  70       for this Job are purged if the Migration job successfully
  71       terminates.  The original data remains on the Volume until
  72       it is recycled and rewritten.
  73 \item A new Migration Backup Job that moves the data from the
  74       previous Backup job to the new Volume.  If you subsequently
  75       do a restore, the data will be read from this Job.
  76 \end{itemize}
  77
  78 If the Migration control job finds a number of JobIds to migrate (e.g.
  79 it is asked to migrate one or more Volumes), it will start one new
  80 migration backup job for each JobId found on the specified Volumes.
  81 Please note that Migration doesn't scale too well since Migrations are
  82 done on a Job by Job basis. This if you select a very large volume or
  83 a number of volumes for migration, you may have a large number of
  84 Jobs that start. Because each job must read the same Volume, they will
  85 run consecutively (not simultaneously).
  86
  87 \section{Migration and Copy Job Resource Directives}
  88
  89 The following directives can appear in a Director's Job resource, and they
  90 are used to define a Migration job.
  91
  92 \begin{description}
  93 \item [Pool = \lt{}Pool-name\gt{}] The Pool specified in the Migration
  94    control Job is not a new directive for the Job resource, but it is
  95    particularly important because it determines what Pool will be examined
  96    for finding JobIds to migrate.  The exception to this is when {\bf
  97    Selection Type = SQLQuery}, and although a Pool directive must still be
  98    specified, no Pool is used, unless you specifically include it in the
  99    SQL query.  Note, in any case, the Pool resource defined by the Pool
 100    directove must contain a {\bf Next Pool = ...} directive to define the
 101    Pool to which the data will be migrated.
 102
 103 \item [Type = Migrate]
 104    {\bf Migrate} is a new type that defines the job that is run as being a
 105    Migration Job.  A Migration Job is a sort of control job and does not have
 106    any Files associated with it, and in that sense they are more or less like
 107    an Admin job.  Migration jobs simply check to see if there is anything to
 108    Migrate then possibly start and control new Backup jobs to migrate the data
 109    from the specified Pool to another Pool.  Note, any original JobId that
 110    is migrated will be marked as having been migrated, and the original
 111    JobId can nolonger be used for restores; all restores will be done from
 112    the new migrated Job.
 113
 114
 115 \item [Type = Copy]
 116    {\bf Copy} is a new type that defines the job that is run as being a
 117    Copy Job.  A Copy Job is a sort of control job and does not have
 118    any Files associated with it, and in that sense they are more or less like
 119    an Admin job.  Copy jobs simply check to see if there is anything to
 120    Copy then possibly start and control new Backup jobs to copy the data
 121    from the specified Pool to another Pool.  Note that when a copy is
 122    made, the original JobIds are left unchanged. The new copies can not
 123    be used for restoration unless you specifically choose them by JobId.
 124    If you subsequently delete a JobId that has a copy, the copy will be
 125    automatically upgraded to a Backup rather than a Copy, and it will
 126    subsequently be used for restoration.
 127
 128 \item [Selection Type = \lt{}Selection-type-keyword\gt{}]
 129   The \lt{}Selection-type-keyword\gt{} determines how the migration job
 130   will go about selecting what JobIds to migrate. In most cases, it is
 131   used in conjunction with a {\bf Selection Pattern} to give you fine
 132   control over exactly what JobIds are selected.  The possible values
 133   for \lt{}Selection-type-keyword\gt{} are:
 134   \begin{description}
 135   \item [SmallestVolume] This selection keyword selects the volume with the
 136         fewest bytes from the Pool to be migrated.  The Pool to be migrated
 137         is the Pool defined in the Migration Job resource.  The migration
 138         control job will then start and run one migration backup job for
 139         each of the Jobs found on this Volume.  The Selection Pattern, if
 140         specified, is not used.
 141
 142   \item [OldestVolume] This selection keyword selects the volume with the
 143         oldest last write time in the Pool to be migrated.  The Pool to be
 144         migrated is the Pool defined in the Migration Job resource.  The
 145         migration control job will then start and run one migration backup
 146         job for each of the Jobs found on this Volume.  The Selection
 147         Pattern, if specified, is not used.
 148
 149   \item [Client] The Client selection type, first selects all the Clients
 150         that have been backed up in the Pool specified by the Migration
 151         Job resource, then it applies the {\bf Selection Pattern} (defined
 152         below) as a regular expression to the list of Client names, giving
 153         a filtered Client name list.  All jobs that were backed up for those
 154         filtered (regexed) Clients will be migrated.
 155         The migration control job will then start and run one migration
 156         backup job for each of the JobIds found for those filtered Clients.
 157
 158   \item [Volume] The Volume selection type, first selects all the Volumes
 159         that have been backed up in the Pool specified by the Migration
 160         Job resource, then it applies the {\bf Selection Pattern} (defined
 161         below) as a regular expression to the list of Volume names, giving
 162         a filtered Volume list.  All JobIds that were backed up for those
 163         filtered (regexed) Volumes will be migrated.
 164         The migration control job will then start and run one migration
 165         backup job for each of the JobIds found on those filtered Volumes.
 166
 167   \item [Job] The Job selection type, first selects all the Jobs (as
 168         defined on the {\bf Name} directive in a Job resource)
 169         that have been backed up in the Pool specified by the Migration
 170         Job resource, then it applies the {\bf Selection Pattern} (defined
 171         below) as a regular expression to the list of Job names, giving
 172         a filtered Job name list.  All JobIds that were run for those
 173         filtered (regexed) Job names will be migrated.  Note, for a given
 174         Job named, they can be many jobs (JobIds) that ran.
 175         The migration control job will then start and run one migration
 176         backup job for each of the Jobs found.
 177
 178   \item [SQLQuery] The SQLQuery selection type, used the {\bf Selection
 179         Pattern} as an SQL query to obtain the JobIds to be migrated.
 180         The Selection Pattern must be a valid SELECT SQL statement for your
 181         SQL engine, and it must return the JobId as the first field
 182         of the SELECT.
 183
 184   \item [PoolOccupancy] This selection type will cause the Migration job
 185         to compute the total size of the specified pool for all Media Types
 186         combined. If it exceeds the {\bf Migration High Bytes} defined in
 187         the Pool, the Migration job will migrate all JobIds beginning with
 188         the oldest Volume in the pool (determined by Last Write time) until
 189         the Pool bytes drop below the {\bf Migration Low Bytes} defined in the
 190         Pool. This calculation should be consider rather approximative because
 191         it is made once by the Migration job before migration is begun, and
 192         thus does not take into account additional data written into the Pool
 193         during the migration.  In addition, the calculation of the total Pool
 194         byte size is based on the Volume bytes saved in the Volume (Media)
 195 database
 196         entries. The bytes calculate for Migration is based on the value stored
 197         in the Job records of the Jobs to be migrated. These do not include the
 198         Storage daemon overhead as is in the total Pool size. As a consequence,
 199         normally, the migration will migrate more bytes than strictly necessary.
 200
 201   \item [PoolTime] The PoolTime selection type will cause the Migration job to
 202         look at the time each JobId has been in the Pool since the job ended.
 203         All Jobs in the Pool longer than the time specified on {\bf Migration Time}
 204         directive in the Pool resource will be migrated.
 205
 206   \item [PoolUncopiedJobs] This selection which copies all jobs from a pool
 207         to an other pool which were not copied before is available only for copy Jobs.
 208
 209   \end{description}
 210
 211 \item [Selection Pattern = \lt{}Quoted-string\gt{}]
 212   The Selection Patterns permitted for each Selection-type-keyword are
 213   described above.
 214
 215   For the OldestVolume and SmallestVolume, this
 216   Selection pattern is not used (ignored).
 217
 218   For the Client, Volume, and Job
 219   keywords, this pattern must be a valid regular expression that will filter
 220   the appropriate item names found in the Pool.
 221
 222   For the SQLQuery keyword, this pattern must be a valid SELECT SQL statement
 223   that returns JobIds.
 224
 225 \item [ Purge Migrated Job = \lt{}yes/no\gt{}]
 226   This directive may be added to the Migration Job definition in the Director
 227   configuration file to purge the job migrated at the end of a migration.
 228
 229 \end{description}
 230
 231 \section{Migration Pool Resource Directives}
 232
 233 The following directives can appear in a Director's Pool resource, and they
 234 are used to define a Migration job.
 235
 236 \begin{description}
 237 \item [Migration Time = \lt{}time-specification\gt{}]
 238    If a PoolTime migration is done, the time specified here in seconds (time
 239    modifiers are permitted -- e.g. hours, ...) will be used. If the
 240    previous Backup Job or Jobs selected have been in the Pool longer than
 241    the specified PoolTime, then they will be migrated.
 242
 243 \item [Migration High Bytes =  \lt{}byte-specification\gt{}]
 244    This directive specifies the number of bytes in the Pool which will
 245    trigger a migration if a {\bf PoolOccupancy} migration selection
 246    type has been specified. The fact that the Pool
 247    usage goes above this level does not automatically trigger a migration
 248    job. However, if a migration job runs and has the PoolOccupancy selection
 249    type set, the Migration High Bytes will be applied.  Bacula does not
 250    currently restrict a pool to have only a single Media Type, so you
 251    must keep in mind that if you mix Media Types in a Pool, the results
 252    may not be what you want, as the Pool count of all bytes will be
 253    for all Media Types combined.
 254
 255 \item [Migration Low Bytes = \lt{}byte-specification\gt{}]
 256    This directive specifies the number of bytes in the Pool which will
 257    stop a migration if a {\bf PoolOccupancy} migration selection
 258    type has been specified and triggered by more than Migration High
 259    Bytes being in the pool. In other words, once a migration job
 260    is started with {\bf PoolOccupancy} migration selection and it
 261    determines that there are more than Migration High Bytes, the
 262    migration job will continue to run jobs until the number of
 263    bytes in the Pool drop to or below Migration Low Bytes.
 264
 265 \item [Next Pool = \lt{}pool-specification\gt{}]
 266    The Next Pool directive specifies the pool to which Jobs will be
 267    migrated. This directive is required to define the Pool into which
 268    the data will be migrated. Without this directive, the migration job
 269    will terminate in error.
 270
 271 \item [Storage = \lt{}storage-specification\gt{}]
 272    The Storage directive specifies what Storage resource will be used
 273    for all Jobs that use this Pool. It takes precedence over any other
 274    Storage specifications that may have been given such as in the
 275    Schedule Run directive, or in the Job resource.  We highly recommend
 276    that you define the Storage resource to be used in the Pool rather
 277    than elsewhere (job, schedule run, ...).
 278 \end{description}
 279
 280 \section{Important Migration Considerations}
 281 \index[general]{Important Migration Considerations}
 282 \begin{itemize}
 283 \item Each Pool into which you migrate Jobs or Volumes {\bf must}
 284       contain Volumes of only one Media Type.
 285
 286 \item Migration takes place on a JobId by JobId basis. That is
 287       each JobId is migrated in its entirety and independently
 288       of other JobIds. Once the Job is migrated, it will be
 289       on the new medium in the new Pool, but for the most part,
 290       aside from having a new JobId, it will appear with all the
 291       same characteristics of the original job (start, end time, ...).
 292       The column RealEndTime in the catalog Job table will contain the
 293       time and date that the Migration terminated, and by comparing
 294       it with the EndTime column you can tell whether or not the
 295       job was migrated.  The original job is purged of its File
 296       records, and its Type field is changed from "B" to "M" to
 297       indicate that the job was migrated.
 298
 299 \item Jobs on Volumes will be Migration only if the Volume is
 300       marked, Full, Used, or Error.  Volumes that are still
 301       marked Append will not be considered for migration. This
 302       prevents Bacula from attempting to read the Volume at
 303       the same time it is writing it. It also reduces other deadlock
 304       situations, as well as avoids the problem that you migrate a
 305       Volume and later find new files appended to that Volume.
 306
 307 \item As noted above, for the Migration High Bytes, the calculation
 308       of the bytes to migrate is somewhat approximate.
 309
 310 \item If you keep Volumes of different Media Types in the same Pool,
 311       it is not clear how well migration will work.  We recommend only
 312       one Media Type per pool.
 313
 314 \item It is possible to get into a resource deadlock where Bacula does
 315       not find enough drives to simultaneously read and write all the
 316       Volumes needed to do Migrations. For the moment, you must take
 317       care as all the resource deadlock algorithms are not yet implemented.
 318
 319 \item Migration is done only when you run a Migration job. If you set a
 320       Migration High Bytes and that number of bytes is exceeded in the Pool
 321       no migration job will automatically start.  You must schedule the
 322       migration jobs, and they must run for any migration to take place.
 323
 324 \item If you migrate a number of Volumes, a very large number of Migration
 325       jobs may start.
 326
 327 \item Figuring out what jobs will actually be migrated can be a bit complicated
 328       due to the flexibility provided by the regex patterns and the number of
 329       different options.  Turning on a debug level of 100 or more will provide
 330       a limited amount of debug information about the migration selection
 331       process.
 332
 333 \item Bacula currently does only minimal Storage conflict resolution, so you
 334       must take care to ensure that you don't try to read and write to the
 335       same device or Bacula may block waiting to reserve a drive that it
 336       will never find. In general, ensure that all your migration
 337       pools contain only one Media Type, and that you always
 338       migrate to pools with different Media Types.
 339
 340 \item The {\bf Next Pool = ...} directive must be defined in the Pool
 341      referenced in the Migration Job to define the Pool into which the
 342      data will be migrated.
 343
 344 \item Pay particular attention to the fact that data is migrated on a Job
 345      by Job basis, and for any particular Volume, only one Job can read
 346      that Volume at a time (no simultaneous read), so migration jobs that
 347      all reference the same Volume will run sequentially.  This can be a
 348      potential bottle neck and does not scale very well to large numbers
 349      of jobs.
 350
 351 \item Only migration of Selection Types of Job and Volume have
 352      been carefully tested. All the other migration methods (time,
 353      occupancy, smallest, oldest, ...) need additional testing.
 354
 355 \item Migration is only implemented for a single Storage daemon.  You
 356      cannot read on one Storage daemon and write on another.
 357 \end{itemize}
 358
 359
 360 \section{Example Migration Jobs}
 361 \index[general]{Example Migration Jobs}
 362
 363 When you specify a Migration Job, you must specify all the standard
 364 directives as for a Job.  However, certain such as the Level, Client, and
 365 FileSet, though they must be defined, are ignored by the Migration job
 366 because the values from the original job used instead.
 367
 368 As an example, suppose you have the following Job that
 369 you run every night. To note: there is no Storage directive in the
 370 Job resource; there is a Storage directive in each of the Pool
 371 resources; the Pool to be migrated (File) contains a Next Pool
 372 directive that defines the output Pool (where the data is written
 373 by the migration job).
 374
 375 \footnotesize
 376 \begin{verbatim}
 377 # Define the backup Job
 378 Job {
 379   Name = "NightlySave"
 380   Type = Backup
 381   Level = Incremental                 # default
 382   Client=rufus-fd
 383   FileSet="Full Set"
 384   Schedule = "WeeklyCycle"
 385   Messages = Standard
 386   Pool = Default
 387 }
 388
 389 # Default pool definition
 390 Pool {
 391   Name = Default
 392   Pool Type = Backup
 393   AutoPrune = yes
 394   Recycle = yes
 395   Next Pool = Tape
 396   Storage = File
 397   LabelFormat = "File"
 398 }
 399
 400 # Tape pool definition
 401 Pool {
 402   Name = Tape
 403   Pool Type = Backup
 404   AutoPrune = yes
 405   Recycle = yes
 406   Storage = DLTDrive
 407 }
 408
 409 # Definition of File storage device
 410 Storage {
 411   Name = File
 412   Address = rufus
 413   Password = "ccV3lVTsQRsdIUGyab0N4sMDavui2hOBkmpBU0aQKOr9"
 414   Device = "File"          # same as Device in Storage daemon
 415   Media Type = File        # same as MediaType in Storage daemon
 416 }
 417
 418 # Definition of DLT tape storage device
 419 Storage {
 420   Name = DLTDrive
 421   Address = rufus
 422   Password = "ccV3lVTsQRsdIUGyab0N4sMDavui2hOBkmpBU0aQKOr9"
 423   Device = "HP DLT 80"      # same as Device in Storage daemon
 424   Media Type = DLT8000      # same as MediaType in Storage daemon
 425 }
 426
 427 \end{verbatim}
 428 \normalsize
 429
 430 Where we have included only the essential information -- i.e. the
 431 Director, FileSet, Catalog, Client, Schedule, and Messages resources are
 432 omitted.
 433
 434 As you can see, by running the NightlySave Job, the data will be backed up
 435 to File storage using the Default pool to specify the Storage as File.
 436
 437 Now, if we add the following Job resource to this conf file.
 438
 439 \footnotesize
 440 \begin{verbatim}
 441 Job {
 442   Name = "migrate-volume"
 443   Type = Migrate
 444   Level = Full
 445   Client = rufus-fd
 446   FileSet = "Full Set"
 447   Messages = Standard
 448   Pool = Default
 449   Maximum Concurrent Jobs = 4
 450   Selection Type = Volume
 451   Selection Pattern = "File"
 452 }
 453 \end{verbatim}
 454 \normalsize
 455
 456 and then run the job named {\bf migrate-volume}, all volumes in the Pool
 457 named Default (as specified in the migrate-volume Job that match the
 458 regular expression pattern {\bf File} will be migrated to tape storage
 459 DLTDrive because the {\bf Next Pool} in the Default Pool specifies that
 460 Migrations should go to the pool named {\bf Tape}, which uses
 461 Storage {\bf DLTDrive}.
 462
 463 If instead, we use a Job resource as follows:
 464
 465 \footnotesize
 466 \begin{verbatim}
 467 Job {
 468   Name = "migrate"
 469   Type = Migrate
 470   Level = Full
 471   Client = rufus-fd
 472   FileSet="Full Set"
 473   Messages = Standard
 474   Pool = Default
 475   Maximum Concurrent Jobs = 4
 476   Selection Type = Job
 477   Selection Pattern = ".*Save"
 478 }
 479 \end{verbatim}
 480 \normalsize
 481
 482 All jobs ending with the name Save will be migrated from the File Default to
 483 the Tape Pool, or from File storage to Tape storage.