git.sur5r.net Git - bacula/bacula/blob - bacula/projects

   1
   2 Projects:
   3                      Bacula Projects Roadmap
   4                     Status updated 18 August 2007
   5                   After removing items completed in version
   6                        2.2.0 and renumbering
   7
   8 Items Completed:
   9
  10 Summary:
  11 Item  1:  Accurate restoration of renamed/deleted files
  12 Item  2:  Allow FD to initiate a backup
  13 Item  3:  Merge multiple backups (Synthetic Backup or Consolidation)
  14 Item  4:  Implement Catalog directive for Pool resource in Director
  15 Item  5:  Add an item to the restore option where you can select a Pool
  16 Item  6:  Deletion of disk Volumes when pruned
  17 Item  7:  Implement Base jobs
  18 Item  8:  Implement Copy pools
  19 Item  9:  Scheduling syntax that permits more flexibility and options
  20 Item 10:  Message mailing based on backup types
  21 Item 11:  Cause daemons to use a specific IP address to source communications
  22 Item 12:  Add Plug-ins to the FileSet Include statements.
  23 Item 13:  Restore only file attributes (permissions, ACL, owner, group...)
  24 Item 14:  Add an override in Schedule for Pools based on backup types
  25 Item 15:  Implement more Python events and functions
  26 Item 16:  Allow inclusion/exclusion of files in a fileset by creation/mod times
  27 Item 17:  Automatic promotion of backup levels based on backup size
  28 Item 18:  Better control over Job execution
  29 Item 19:  Automatic disabling of devices
  30 Item 20:  An option to operate on all pools with update vol parameters
  31 Item 21:  Include timestamp of job launch in "stat clients" output
  32 Item 22:  Implement Storage daemon compression
  33 Item 23:  Improve Bacula's tape and drive usage and cleaning management
  34 Item 24:  Multiple threads in file daemon for the same job
  35 Item 25:  Archival (removal) of User Files to Tape
  36
  37
  38 Item  1:  Accurate restoration of renamed/deleted files
  39   Date:   28 November 2005
  40   Origin: Martin Simmons (martin at lispworks dot com)
  41   Status: Robert Nelson will implement this
  42
  43   What:   When restoring a fileset for a specified date (including "most
  44           recent"), Bacula should give you exactly the files and directories
  45           that existed at the time of the last backup prior to that date.
  46
  47           Currently this only works if the last backup was a Full backup.
  48           When the last backup was Incremental/Differential, files and
  49           directories that have been renamed or deleted since the last Full
  50           backup are not currently restored correctly.  Ditto for files with
  51           extra/fewer hard links than at the time of the last Full backup.
  52
  53   Why:    Incremental/Differential would be much more useful if this worked.
  54
  55   Notes:  Merging of multiple backups into a single one seems to
  56           rely on this working, otherwise the merged backups will not be
  57           truly equivalent to a Full backup.
  58
  59   Note:   Kern: notes shortened. This can be done without the need for
  60           inodes. It is essentially the same as the current Verify job,
  61           but one additional database record must be written, which does
  62           not need any database change.
  63
  64   Notes:  Kern: see if we can correct restoration of directories if
  65           replace=ifnewer is set.  Currently, if the directory does not
  66           exist, a "dummy" directory is created, then when all the files
  67           are updated, the dummy directory is newer so the real values
  68           are not updated.
  69
  70 Item  2:  Allow FD to initiate a backup
  71   Origin: Frank Volf (frank at deze dot org)
  72   Date:   17 November 2005
  73   Status:
  74
  75    What:  Provide some means, possibly by a restricted console that
  76           allows a FD to initiate a backup, and that uses the connection
  77           established by the FD to the Director for the backup so that
  78           a Director that is firewalled can do the backup.
  79
  80    Why:   Makes backup of laptops much easier.
  81
  82
  83 Item  3:  Merge multiple backups (Synthetic Backup or Consolidation)
  84   Origin: Marc Cousin and Eric Bollengier
  85   Date:   15 November 2005
  86   Status:
  87
  88   What:   A merged backup is a backup made without connecting to the Client.
  89           It would be a Merge of existing backups into a single backup.
  90           In effect, it is like a restore but to the backup medium.
  91
  92           For instance, say that last Sunday we made a full backup.  Then
  93           all week long, we created incremental backups, in order to do
  94           them fast.  Now comes Sunday again, and we need another full.
  95           The merged backup makes it possible to do instead an incremental
  96           backup (during the night for instance), and then create a merged
  97           backup during the day, by using the full and incrementals from
  98           the week.  The merged backup will be exactly like a full made
  99           Sunday night on the tape, but the production interruption on the
 100           Client will be minimal, as the Client will only have to send
 101           incrementals.
 102
 103           In fact, if it's done correctly, you could merge all the
 104           Incrementals into single Incremental, or all the Incrementals
 105           and the last Differential into a new Differential, or the Full,
 106           last differential and all the Incrementals into a new Full
 107           backup.  And there is no need to involve the Client.
 108
 109   Why:    The benefit is that :
 110           - the Client just does an incremental ;
 111           - the merged backup on tape is just as a single full backup,
 112             and can be restored very fast.
 113
 114           This is also a way of reducing the backup data since the old
 115           data can then be pruned (or not) from the catalog, possibly
 116           allowing older volumes to be recycled
 117
 118 Item  4:  Implement Catalog directive for Pool resource in Director
 119   Origin: Alan Davis adavis@ruckus.com
 120   Date:   6 March 2007
 121   Status: Submitted
 122
 123   What:   The current behavior is for the director to create all pools
 124           found in the configuration file in all catalogs.  Add a
 125           Catalog directive to the Pool resource to specify which
 126           catalog to use for each pool definition.
 127
 128   Why:    This allows different catalogs to have different pool
 129           attributes and eliminates the side-effect of adding
 130           pools to catalogs that don't need/use them.
 131
 132   Notes:  Kern: I think this is relatively easy to do, and it is really
 133           a pre-requisite to a number of the Copy pool, ... projects
 134           that are listed here.
 135
 136 Item  5:  Add an item to the restore option where you can select a Pool
 137   Origin: kshatriyak at gmail dot com
 138     Date: 1/1/2006
 139   Status:
 140
 141     What: In the restore option (Select the most recent backup for a
 142           client) it would be useful to add an option where you can limit
 143           the selection to a certain pool.
 144
 145      Why: When using cloned jobs, most of the time you have 2 pools - a
 146           disk pool and a tape pool.  People who have 2 pools would like to
 147           select the most recent backup from disk, not from tape (tape
 148           would be only needed in emergency).  However, the most recent
 149           backup (which may just differ a second from the disk backup) may
 150           be on tape and would be selected.  The problem becomes bigger if
 151           you have a full and differential - the most "recent" full backup
 152           may be on disk, while the most recent differential may be on tape
 153           (though the differential on disk may differ even only a second or
 154           so).  Bacula will complain that the backups reside on different
 155           media then.  For now the only solution now when restoring things
 156           when you have 2 pools is to manually search for the right
 157           job-id's and enter them by hand, which is a bit fault tolerant.
 158
 159   Notes:  Kern: This is a nice idea. It could also be the way to support
 160           Jobs that have been Copied (similar to migration, but not yet
 161           implemented).
 162
 163
 164
 165 Item  6:  Deletion of disk Volumes when pruned
 166   Date:   Nov 25, 2005
 167   Origin: Ross Boylan <RossBoylan at stanfordalumni dot org> (edited
 168           by Kern)
 169   Status:
 170
 171    What:  Provide a way for Bacula to automatically remove Volumes
 172           from the filesystem, or optionally to truncate them.
 173           Obviously, the Volume must be pruned prior removal.
 174
 175   Why:    This would allow users more control over their Volumes and
 176           prevent disk based volumes from consuming too much space.
 177
 178   Notes:  The following two directives might do the trick:
 179
 180           Volume Data Retention = <time period>
 181           Remove Volume After = <time period>
 182
 183           The migration project should also remove a Volume that is
 184           migrated. This might also work for tape Volumes.
 185
 186 Item  7:  Implement Base jobs
 187   Date:   28 October 2005
 188   Origin: Kern
 189   Status:
 190
 191   What:   A base job is sort of like a Full save except that you
 192           will want the FileSet to contain only files that are
 193           unlikely to change in the future (i.e.  a snapshot of
 194           most of your system after installing it).  After the
 195           base job has been run, when you are doing a Full save,
 196           you specify one or more Base jobs to be used.  All
 197           files that have been backed up in the Base job/jobs but
 198           not modified will then be excluded from the backup.
 199           During a restore, the Base jobs will be automatically
 200           pulled in where necessary.
 201
 202   Why:    This is something none of the competition does, as far as
 203           we know (except perhaps BackupPC, which is a Perl program that
 204           saves to disk only).  It is big win for the user, it
 205           makes Bacula stand out as offering a unique
 206           optimization that immediately saves time and money.
 207           Basically, imagine that you have 100 nearly identical
 208           Windows or Linux machine containing the OS and user
 209           files.  Now for the OS part, a Base job will be backed
 210           up once, and rather than making 100 copies of the OS,
 211           there will be only one.  If one or more of the systems
 212           have some files updated, no problem, they will be
 213           automatically restored.
 214
 215   Notes:  Huge savings in tape usage even for a single machine.
 216           Will require more resources because the DIR must send
 217           FD a list of files/attribs, and the FD must search the
 218           list and compare it for each file to be saved.
 219
 220
 221 Item  8:  Implement Copy pools
 222   Date:   27 November 2005
 223   Origin: David Boyes (dboyes at sinenomine dot net)
 224   Status:
 225
 226   What:   I would like Bacula to have the capability to write copies
 227           of backed-up data on multiple physical volumes selected
 228           from different pools without transferring the data
 229           multiple times, and to accept any of the copy volumes
 230           as valid for restore.
 231
 232   Why:    In many cases, businesses are required to keep offsite
 233           copies of backup volumes, or just wish for simple
 234           protection against a human operator dropping a storage
 235           volume and damaging it. The ability to generate multiple
 236           volumes in the course of a single backup job allows
 237           customers to simple check out one copy and send it
 238           offsite, marking it as out of changer or otherwise
 239           unavailable. Currently, the library and magazine
 240           management capability in Bacula does not make this process
 241           simple.
 242
 243           Restores would use the copy of the data on the first
 244           available volume, in order of Copy pool chain definition.
 245
 246           This is also a major scalability issue -- as the number of
 247           clients increases beyond several thousand, and the volume
 248           of data increases, transferring the data multiple times to
 249           produce additional copies of the backups will become
 250           physically impossible due to transfer speed
 251           issues. Generating multiple copies at server side will
 252           become the only practical option.
 253
 254   How:    I suspect that this will require adding a multiplexing
 255           SD that appears to be a SD to a specific FD, but 1-n FDs
 256           to the specific back end SDs managing the primary and copy
 257           pools.  Storage pools will also need to acquire parameters
 258           to define the pools to be used for copies.
 259
 260   Notes:  I would commit some of my developers' time if we can agree
 261           on the design and behavior.
 262
 263   Notes:  Additional notes from David:
 264           I think there's two areas where new configuration would be needed.
 265
 266           1) Identify a "SD mux" SD (specify it in the config just like a normal
 267           SD. The SD configuration would need something like a "Daemon Type =
 268           Normal/Mux" keyword to identify it as a multiplexor. (The director code
 269           would need modification to add the ability to do the multiple session
 270           setup, but the impact of the change would be new code that was invoked
 271           only when a SDmux is needed).
 272
 273           2) Additional keywords in the Pool definition to identify the need to
 274           create copies. Each pool would acquire a Copypool= attribute (may be
 275           repeated to generate more than one copy. 3 is about the practical limit,
 276           but no point in hardcoding that).
 277
 278           Example:
 279           Pool {
 280             Name = Primary
 281             Pool Type = Backup
 282             Copypool = Copy1
 283             Copypool = OffsiteCopy2
 284           }
 285
 286           where Copy1 and OffsiteCopy2 are valid pools.
 287
 288           In terms of function (shorthand):
 289           Backup job X is defined normally, specifying pool Primary as the pool to
 290           use. Job gets scheduled, and Bacula starts scheduling resources.
 291           Scheduler looks at pool definition for Primary, sees that there are a
 292           non-zero number of copypool keywords. The director then connects to an
 293           available SDmux, passes it the pool ids for Primary, Copy1, and
 294           OffsiteCopy2 and waits. SDmux then goes out and reserves devices and
 295           volumes in the normal SDs that serve Primary, Copy1 and OffsiteCopy2.
 296           When all are ready, the SDmux signals ready back to the director, and
 297           the FD is given the address of the SDmux as the SD to communicate with.
 298           Backup proceeds normally, with the SDmux duplicating blocks to each
 299           connected normal SD, and returning ready when all defined copies have
 300           been written. At EOJ, FD shuts down connection with SDmux, which closes
 301           down the normal SD connections and goes back to an idle state.
 302           SDmux does not update database; normal SDs do (noting that file is
 303           present on each volume it has been written to).
 304
 305           On restore, director looks for the volume containing the file in pool
 306           Primary first, then Copy1, then OffsiteCopy2. If the volume holding the
 307           file in pool Primary is missing or busy (being written in another job,
 308           etc), or one of the volumes from the copypool list that have the file in
 309           question is already mounted and ready for some reason, use it to do the
 310           restore, else mount one of the copypool volumes and proceed.
 311
 312
 313 Item  9:  Scheduling syntax that permits more flexibility and options
 314    Date:  15 December 2006
 315   Origin: Gregory Brauer (greg at wildbrain dot com) and
 316           Florian Schnabel <florian.schnabel at docufy dot de>
 317   Status:
 318
 319    What:  Currently, Bacula only understands how to deal with weeks of the
 320           month or weeks of the year in schedules.  This makes it impossible
 321           to do a true weekly rotation of tapes.  There will always be a
 322           discontinuity that will require disruptive manual intervention at
 323           least monthly or yearly because week boundaries never align with
 324           month or year boundaries.
 325
 326           A solution would be to add a new syntax that defines (at least)
 327           a start timestamp, and repetition period.
 328
 329           An easy option to skip a certain job  on a certain date.
 330
 331
 332      Why: Rotated backups done at weekly intervals are useful, and Bacula
 333           cannot currently do them without extensive hacking.
 334
 335           You could then easily skip tape backups on holidays.  Especially
 336           if you got no autochanger and can only fit one backup on a tape
 337           that would be really handy, other jobs could proceed normally
 338           and you won't get errors that way.
 339
 340
 341    Notes: Here is an example syntax showing a 3-week rotation where full
 342           Backups would be performed every week on Saturday, and an
 343           incremental would be performed every week on Tuesday.  Each
 344           set of tapes could be removed from the loader for the following
 345           two cycles before coming back and being reused on the third
 346           week.  Since the execution times are determined by intervals
 347           from a given point in time, there will never be any issues with
 348           having to adjust to any sort of arbitrary time boundary.  In
 349           the example provided, I even define the starting schedule
 350           as crossing both a year and a month boundary, but the run times
 351           would be based on the "Repeat" value and would therefore happen
 352           weekly as desired.
 353
 354
 355           Schedule {
 356               Name = "Week 1 Rotation"
 357               #Saturday.  Would run Dec 30, Jan 20, Feb 10, etc.
 358               Run {
 359                   Options {
 360                       Type   = Full
 361                       Start  = 2006-12-30 01:00
 362                       Repeat = 3w
 363                   }
 364               }
 365               #Tuesday.  Would run Jan 2, Jan 23, Feb 13, etc.
 366               Run {
 367                   Options {
 368                       Type   = Incremental
 369                       Start  = 2007-01-02 01:00
 370                       Repeat = 3w
 371                   }
 372               }
 373           }
 374
 375           Schedule {
 376               Name = "Week 2 Rotation"
 377               #Saturday.  Would run Jan 6, Jan 27, Feb 17, etc.
 378               Run {
 379                   Options {
 380                       Type   = Full
 381                       Start  = 2007-01-06 01:00
 382                       Repeat = 3w
 383                   }
 384               }
 385               #Tuesday.  Would run Jan 9, Jan 30, Feb 20, etc.
 386               Run {
 387                   Options {
 388                       Type   = Incremental
 389                       Start  = 2007-01-09 01:00
 390                       Repeat = 3w
 391                   }
 392               }
 393           }
 394
 395           Schedule {
 396               Name = "Week 3 Rotation"
 397               #Saturday.  Would run Jan 13, Feb 3, Feb 24, etc.
 398               Run {
 399                   Options {
 400                       Type   = Full
 401                       Start  = 2007-01-13 01:00
 402                       Repeat = 3w
 403                   }
 404               }
 405               #Tuesday.  Would run Jan 16, Feb 6, Feb 27, etc.
 406               Run {
 407                   Options {
 408                       Type   = Incremental
 409                       Start  = 2007-01-16 01:00
 410                       Repeat = 3w
 411                   }
 412               }
 413           }
 414
 415    Notes: Kern: I have merged the previously separate project of skipping
 416           jobs (via Schedule syntax) into this.
 417
 418
 419 Item 10:  Message mailing based on backup types
 420  Origin:  Evan Kaufman <evan.kaufman@gmail.com>
 421    Date:  January 6, 2006
 422  Status:
 423
 424    What:  In the "Messages" resource definitions, allowing messages
 425           to be mailed based on the type (backup, restore, etc.) and level
 426           (full, differential, etc) of job that created the originating
 427           message(s).
 428
 429  Why:     It would, for example, allow someone's boss to be emailed
 430           automatically only when a Full Backup job runs, so he can
 431           retrieve the tapes for offsite storage, even if the IT dept.
 432           doesn't (or can't) explicitly notify him.  At the same time, his
 433           mailbox wouldnt be filled by notifications of Verifies, Restores,
 434           or Incremental/Differential Backups (which would likely be kept
 435           onsite).
 436
 437  Notes:   One way this could be done is through additional message types, for example:
 438
 439    Messages {
 440      # email the boss only on full system backups
 441      Mail = boss@mycompany.com = full, !incremental, !differential, !restore,
 442             !verify, !admin
 443      # email us only when something breaks
 444      MailOnError = itdept@mycompany.com = all
 445    }
 446
 447    Notes: Kern: This should be rather trivial to implement.
 448
 449
 450 Item 11:  Cause daemons to use a specific IP address to source communications
 451  Origin:  Bill Moran <wmoran@collaborativefusion.com>
 452  Date:    18 Dec 2006
 453  Status:
 454  What:    Cause Bacula daemons (dir, fd, sd) to always use the ip address
 455           specified in the [DIR|DF|SD]Addr directive as the source IP
 456           for initiating communication.
 457  Why:     On complex networks, as well as extremely secure networks, it's
 458           not unusual to have multiple possible routes through the network.
 459           Often, each of these routes is secured by different policies
 460           (effectively, firewalls allow or deny different traffic depending
 461           on the source address)
 462           Unfortunately, it can sometimes be difficult or impossible to
 463           represent this in a system routing table, as the result is
 464           excessive subnetting that quickly exhausts available IP space.
 465           The best available workaround is to provide multiple IPs to
 466           a single machine that are all on the same subnet.  In order
 467           for this to work properly, applications must support the ability
 468           to bind outgoing connections to a specified address, otherwise
 469           the operating system will always choose the first IP that
 470           matches the required route.
 471  Notes:   Many other programs support this.  For example, the following
 472           can be configured in BIND:
 473           query-source address 10.0.0.1;
 474           transfer-source 10.0.0.2;
 475           Which means queries from this server will always come from
 476           10.0.0.1 and zone transfers will always originate from
 477           10.0.0.2.
 478
 479
 480 Item 12:  Add Plug-ins to the FileSet Include statements.
 481   Date:   28 October 2005
 482   Origin: Kern
 483   Status: Partially coded in 1.37 -- much more to do.
 484
 485   What:   Allow users to specify wild-card and/or regular
 486           expressions to be matched in both the Include and
 487           Exclude directives in a FileSet.  At the same time,
 488           allow users to define plug-ins to be called (based on
 489           regular expression/wild-card matching).
 490
 491   Why:    This would give the users the ultimate ability to control
 492           how files are backed up/restored.  A user could write a
 493           plug-in knows how to backup his Oracle database without
 494           stopping/starting it, for example.
 495
 496
 497 Item 13:  Restore only file attributes (permissions, ACL, owner, group...)
 498   Origin: Eric Bollengier
 499   Date:   30/12/2006
 500   Status:
 501
 502   What:   The goal of this project is to be able to restore only rights
 503           and attributes of files without crushing them.
 504
 505   Why:    Who have never had to repair a chmod -R 777, or a wild update
 506           of recursive right under Windows? At this time, you must have
 507           enough space to restore data, dump attributes (easy with acl,
 508           more complex with unix/windows rights) and apply them to your
 509           broken tree. With this options, it will be very easy to compare
 510           right or ACL over the time.
 511
 512   Notes:  If the file is here, we skip restore and we change rights.
 513           If the file isn't here, we can create an empty one and apply
 514           rights or do nothing.
 515
 516
 517
 518 Item 14:  Add an override in Schedule for Pools based on backup types
 519 Date:     19 Jan 2005
 520 Origin:   Chad Slater <chad.slater@clickfox.com>
 521 Status:
 522
 523   What:   Adding a FullStorage=BigTapeLibrary in the Schedule resource
 524           would help those of us who use different storage devices for different
 525           backup levels cope with the "auto-upgrade" of a backup.
 526
 527   Why:    Assume I add several new devices to be backed up, i.e. several
 528           hosts with 1TB RAID.  To avoid tape switching hassles, incrementals are
 529           stored in a disk set on a 2TB RAID.  If you add these devices in the
 530           middle of the month, the incrementals are upgraded to "full" backups,
 531           but they try to use the same storage device as requested in the
 532           incremental job, filling up the RAID holding the differentials.  If we
 533           could override the Storage parameter for full and/or differential
 534           backups, then the Full job would use the proper Storage device, which
 535           has more capacity (i.e. a 8TB tape library.
 536
 537
 538 Item 15:  Implement more Python events and functions
 539   Date:   28 October 2005
 540   Origin: Kern
 541   Status:
 542
 543   What:   Allow Python scripts to be called at more places
 544           within Bacula and provide additional access to Bacula
 545           internal variables.
 546
 547           Implement an interface for Python scripts to access the
 548           catalog through Bacula.
 549
 550   Why:    This will permit users to customize Bacula through
 551           Python scripts.
 552
 553   Notes:  Recycle event
 554           Scratch pool event
 555           NeedVolume event
 556           MediaFull event
 557
 558           Also add a way to get a listing of currently running
 559           jobs (possibly also scheduled jobs).
 560
 561
 562           to start the appropriate job.
 563
 564
 565 Item 16:  Allow inclusion/exclusion of files in a fileset by creation/mod times
 566   Origin: Evan Kaufman <evan.kaufman@gmail.com>
 567   Date:   January 11, 2006
 568   Status:
 569
 570   What:   In the vein of the Wild and Regex directives in a Fileset's
 571           Options, it would be helpful to allow a user to include or exclude
 572           files and directories by creation or modification times.
 573
 574           You could factor the Exclude=yes|no option in much the same way it
 575           affects the Wild and Regex directives.  For example, you could exclude
 576           all files modified before a certain date:
 577
 578    Options {
 579      Exclude = yes
 580      Modified Before = ####
 581    }
 582
 583            Or you could exclude all files created/modified since a certain date:
 584
 585    Options {
 586       Exclude = yes
 587      Created Modified Since = ####
 588    }
 589
 590            The format of the time/date could be done several ways, say the number
 591            of seconds since the epoch:
 592            1137008553 = Jan 11 2006, 1:42:33PM   # result of `date +%s`
 593
 594            Or a human readable date in a cryptic form:
 595            20060111134233 = Jan 11 2006, 1:42:33PM   # YYYYMMDDhhmmss
 596
 597   Why:    I imagine a feature like this could have many uses. It would
 598           allow a user to do a full backup while excluding the base operating
 599           system files, so if I installed a Linux snapshot from a CD yesterday,
 600           I'll *exclude* all files modified *before* today.  If I need to
 601           recover the system, I use the CD I already have, plus the tape backup.
 602           Or if, say, a Windows client is hit by a particularly corrosive
 603           virus, and I need to *exclude* any files created/modified *since* the
 604           time of infection.
 605
 606   Notes:  Of course, this feature would work in concert with other
 607           in/exclude rules, and wouldnt override them (or each other).
 608
 609   Notes:  The directives I'd imagine would be along the lines of
 610           "[Created] [Modified] [Before|Since] = <date>".
 611           So one could compare against 'ctime' and/or 'mtime', but ONLY 'before'
 612            or 'since'.
 613
 614
 615 Item 17:  Automatic promotion of backup levels based on backup size
 616    Date:  19 January 2006
 617   Origin: Adam Thornton <athornton@sinenomine.net>
 618   Status:
 619
 620     What: Amanda has a feature whereby it estimates the space that a
 621           differential, incremental, and full backup would take.  If the
 622           difference in space required between the scheduled level and the next
 623           level up is beneath some user-defined critical threshold, the backup
 624           level is bumped to the next type.  Doing this minimizes the number of
 625           volumes necessary during a restore, with a fairly minimal cost in
 626           backup media space.
 627
 628     Why:  I know at least one (quite sophisticated and smart) user
 629           for whom the absence of this feature is a deal-breaker in terms of
 630           using Bacula; if we had it it would eliminate the one cool thing
 631           Amanda can do and we can't (at least, the one cool thing I know of).
 632
 633
 634 Item 18:  Better control over Job execution
 635    Date:  18 August 2007
 636   Origin: Kern
 637   Status:
 638
 639     What: Bacula needs a few extra features for better Job execution:
 640           1. A way to prevent multiple Jobs of the same name from
 641              being scheduled at the same time (ususally happens when
 642              a job is missed because a client is down).
 643           2. Directives that permit easier upgrading of Job types
 644              based on a period of time. I.e. "do a Full at least
 645              once every 2 weeks", or "do a differential at least
 646              once a week". If a lower level job is scheduled when
 647              it begins to run it will be upgraded depending on
 648              the specified criteria.
 649
 650     Why:  Obvious.
 651
 652
 653 Item 19:  Automatic disabling of devices
 654    Date:  2005-11-11
 655   Origin: Peter Eriksson <peter at ifm.liu dot se>
 656   Status:
 657
 658    What:  After a configurable amount of fatal errors with a tape drive
 659           Bacula should automatically disable further use of a certain
 660           tape drive. There should also be "disable"/"enable" commands in
 661           the "bconsole" tool.
 662
 663    Why:   On a multi-drive jukebox there is a possibility of tape drives
 664           going bad during large backups (needing a cleaning tape run,
 665           tapes getting stuck). It would be advantageous if Bacula would
 666           automatically disable further use of a problematic tape drive
 667           after a configurable amount of errors has occurred.
 668
 669           An example: I have a multi-drive jukebox (6 drives, 380+ slots)
 670           where tapes occasionally get stuck inside the drive. Bacula will
 671           notice that the "mtx-changer" command will fail and then fail
 672           any backup jobs trying to use that drive. However, it will still
 673           keep on trying to run new jobs using that drive and fail -
 674           forever, and thus failing lots and lots of jobs... Since we have
 675           many drives Bacula could have just automatically disabled
 676           further use of that drive and used one of the other ones
 677           instead.
 678
 679 Item 20:  An option to operate on all pools with update vol parameters
 680   Origin: Dmitriy Pinchukov <absh@bossdev.kiev.ua>
 681    Date:  16 August 2006
 682   Status:
 683
 684    What:  When I do update -> Volume parameters -> All Volumes
 685           from Pool, then I have to select pools one by one.  I'd like
 686           console to have an option like "0: All Pools" in the list of
 687           defined pools.
 688
 689    Why:   I have many pools and therefore unhappy with manually
 690           updating each of them using update -> Volume parameters -> All
 691           Volumes from Pool -> pool #.
 692
 693
 694 Item 21:  Include timestamp of job launch in "stat clients" output
 695   Origin: Mark Bergman <mark.bergman@uphs.upenn.edu>
 696   Date:   Tue Aug 22 17:13:39 EDT 2006
 697   Status:
 698
 699   What:   The "stat clients" command doesn't include any detail on when
 700           the active backup jobs were launched.
 701
 702   Why:    Including the timestamp would make it much easier to decide whether
 703           a job is running properly.
 704
 705   Notes:  It may be helpful to have the output from "stat clients" formatted
 706           more like that from "stat dir" (and other commands), in a column
 707           format. The per-client information that's currently shown (level,
 708           client name, JobId, Volume, pool, device, Files, etc.) is good, but
 709           somewhat hard to parse (both programmatically and visually),
 710           particularly when there are many active clients.
 711
 712
 713
 714 Item 22:  Implement Storage daemon compression
 715   Date:   18 December 2006
 716   Origin: Vadim A. Umanski , e-mail umanski@ext.ru
 717   Status:
 718   What:   The ability to compress backup data on the SD receiving data
 719           instead of doing that on client sending data.
 720   Why:    The need is practical. I've got some machines that can send
 721           data to the network 4 or 5 times faster than compressing
 722           them (I've measured that). They're using fast enough SCSI/FC
 723           disk subsystems but rather slow CPUs (ex. UltraSPARC II).
 724           And the backup server has got a quite fast CPUs (ex. Dual P4
 725           Xeons) and quite a low load. When you have 20, 50 or 100 GB
 726           of raw data - running a job 4 to 5 times faster - that
 727           really matters. On the other hand, the data can be
 728           compressed 50% or better - so losing twice more space for
 729           disk backup is not good at all. And the network is all mine
 730           (I have a dedicated management/provisioning network) and I
 731           can get as high bandwidth as I need - 100Mbps, 1000Mbps...
 732           That's why the server-side compression feature is needed!
 733   Notes:
 734
 735 Item 23:  Improve Bacula's tape and drive usage and cleaning management
 736   Date:   8 November 2005, November 11, 2005
 737   Origin: Adam Thornton <athornton at sinenomine dot net>,
 738           Arno Lehmann <al at its-lehmann dot de>
 739   Status:
 740
 741   What:   Make Bacula manage tape life cycle information, tape reuse
 742           times and drive cleaning cycles.
 743
 744   Why:    All three parts of this project are important when operating
 745           backups.
 746           We need to know which tapes need replacement, and we need to
 747           make sure the drives are cleaned when necessary.  While many
 748           tape libraries and even autoloaders can handle all this
 749           automatically, support by Bacula can be helpful for smaller
 750           (older) libraries and single drives.  Limiting the number of
 751           times a tape is used might prevent tape errors when using
 752           tapes until the drives can't read it any more.  Also, checking
 753           drive status during operation can prevent some failures (as I
 754           [Arno] had to learn the hard way...)
 755
 756   Notes:  First, Bacula could (and even does, to some limited extent)
 757           record tape and drive usage.  For tapes, the number of mounts,
 758           the amount of data, and the time the tape has actually been
 759           running could be recorded.  Data fields for Read and Write
 760           time and Number of mounts already exist in the catalog (I'm
 761           not sure if VolBytes is the sum of all bytes ever written to
 762           that volume by Bacula).  This information can be important
 763           when determining which media to replace.  The ability to mark
 764           Volumes as "used up" after a given number of write cycles
 765           should also be implemented so that a tape is never actually
 766           worn out.  For the tape drives known to Bacula, similar
 767           information is interesting to determine the device status and
 768           expected life time: Time it's been Reading and Writing, number
 769           of tape Loads / Unloads / Errors.  This information is not yet
 770           recorded as far as I [Arno] know.  A new volume status would
 771           be necessary for the new state, like "Used up" or "Worn out".
 772           Volumes with this state could be used for restores, but not
 773           for writing. These volumes should be migrated first (assuming
 774           migration is implemented) and, once they are no longer needed,
 775           could be moved to a Trash pool.
 776
 777           The next step would be to implement a drive cleaning setup.
 778           Bacula already has knowledge about cleaning tapes.  Once it
 779           has some information about cleaning cycles (measured in drive
 780           run time, number of tapes used, or calender days, for example)
 781           it can automatically execute tape cleaning (with an
 782           autochanger, obviously) or ask for operator assistance loading
 783           a cleaning tape.
 784
 785           The final step would be to implement TAPEALERT checks not only
 786           when changing tapes and only sending the information to the
 787           administrator, but rather checking after each tape error,
 788           checking on a regular basis (for example after each tape
 789           file), and also before unloading and after loading a new tape.
 790           Then, depending on the drives TAPEALERT state and the known
 791           drive cleaning state Bacula could automatically schedule later
 792           cleaning, clean immediately, or inform the operator.
 793
 794           Implementing this would perhaps require another catalog change
 795           and perhaps major changes in SD code and the DIR-SD protocol,
 796           so I'd only consider this worth implementing if it would
 797           actually be used or even needed by many people.
 798
 799           Implementation of these projects could happen in three distinct
 800           sub-projects: Measuring Tape and Drive usage, retiring
 801           volumes, and handling drive cleaning and TAPEALERTs.
 802
 803 Item 24:  Multiple threads in file daemon for the same job
 804   Date:   27 November 2005
 805   Origin: Ove Risberg (Ove.Risberg at octocode dot com)
 806   Status:
 807
 808   What:   I want the file daemon to start multiple threads for a backup
 809           job so the fastest possible backup can be made.
 810
 811           The file daemon could parse the FileSet information and start
 812           one thread for each File entry located on a separate
 813           filesystem.
 814
 815           A confiuration option in the job section should be used to
 816           enable or disable this feature. The confgutration option could
 817           specify the maximum number of threads in the file daemon.
 818
 819           If the theads could spool the data to separate spool files
 820           the restore process will not be much slower.
 821
 822   Why:    Multiple concurrent backups of a large fileserver with many
 823           disks and controllers will be much faster.
 824
 825 Item 25:  Archival (removal) of User Files to Tape
 826   Date:   Nov. 24/2005
 827   Origin: Ray Pengelly [ray at biomed dot queensu dot ca
 828   Status:
 829
 830   What:   The ability to archive data to storage based on certain parameters
 831           such as age, size, or location.  Once the data has been written to
 832           storage and logged it is then pruned from the originating
 833           filesystem. Note! We are talking about user's files and not
 834           Bacula Volumes.
 835
 836   Why:    This would allow fully automatic storage management which becomes
 837           useful for large datastores.  It would also allow for auto-staging
 838           from one media type to another.
 839
 840           Example 1) Medical imaging needs to store large amounts of data.
 841           They decide to keep data on their servers for 6 months and then put
 842           it away for long term storage.  The server then finds all files
 843           older than 6 months writes them to tape.  The files are then removed
 844           from the server.
 845
 846           Example 2) All data that hasn't been accessed in 2 months could be
 847           moved from high-cost, fibre-channel disk storage to a low-cost
 848           large-capacity SATA disk storage pool which doesn't have as quick of
 849           access time.  Then after another 6 months (or possibly as one
 850           storage pool gets full) data is migrated to Tape.
 851
 852
 853
 854
 855 ========== Items on put hold by Kern ============================
 856
 857 Item h1:  Split documentation
 858   Origin: Maxx <maxxatworkat gmail dot com>
 859   Date:   27th July 2006
 860   Status: Approved, awaiting implementation
 861
 862   What:   Split documentation in several books
 863
 864   Why:    Bacula manual has now more than 600 pages, and looking for
 865           implementation details is getting complicated.  I think
 866           it would be good to split the single volume in two or
 867           maybe three parts:
 868
 869           1) Introduction, requirements and tutorial, typically
 870              are useful only until first installation time
 871
 872           2) Basic installation and configuration, with all the
 873              gory details about the directives supported 3)
 874              Advanced Bacula: testing, troubleshooting, GUI and
 875              ancillary programs, security managements, scripting,
 876              etc.
 877
 878   Notes:  This is a project that needs to be done, and will be implemented,
 879           but it is really a developer issue of timing, and does not
 880           needed to be included in the voting.
 881
 882
 883 Item h2:  Implement support for stacking arbitrary stream filters, sinks.
 884 Date:     23 November 2006
 885 Origin:   Landon Fuller <landonf@threerings.net>
 886 Status:   Planning. Assigned to landonf.
 887
 888   What:   Implement support for the following:
 889           - Stacking arbitrary stream filters (eg, encryption, compression,
 890             sparse data handling))
 891           - Attaching file sinks to terminate stream filters (ie, write out
 892             the resultant data to a file)
 893           - Refactor the restoration state machine accordingly
 894
 895    Why:   The existing stream implementation suffers from the following:
 896            - All state (compression, encryption, stream restoration), is
 897              global across the entire restore process, for all streams. There are
 898              multiple entry and exit points in the restoration state machine, and
 899              thus multiple places where state must be allocated, deallocated,
 900              initialized, or reinitialized. This results in exceptional complexity
 901              for the author of a stream filter.
 902            - The developer must enumerate all possible combinations of filters
 903              and stream types (ie, win32 data with encryption, without encryption,
 904              with encryption AND compression, etc).
 905
 906   Notes:  This feature request only covers implementing the stream filters/
 907           sinks, and refactoring the file daemon's restoration implementation
 908           accordingly. If I have extra time, I will also rewrite the backup
 909           implementation. My intent in implementing the restoration first is to
 910           solve pressing bugs in the restoration handling, and to ensure that
 911           the new restore implementation handles existing backups correctly.
 912
 913           I do not plan on changing the network or tape data structures to
 914           support defining arbitrary stream filters, but supporting that
 915           functionality is the ultimate goal.
 916
 917           Assistance with either code or testing would be fantastic.
 918
 919   Notes:  Kern: this project has a lot of merit, and we need to do it, but
 920           it is really an issue for developers rather than a new feature
 921           for users, so I have removed it from the voting list, but kept it
 922           here, but at some point, it will be implemented.
 923
 924 Item h3:  Filesystem watch triggered backup.
 925   Date:   31 August 2006
 926   Origin: Jesper Krogh <jesper@krogh.cc>
 927   Status:
 928
 929   What:   With inotify and similar filesystem triggeret notification
 930           systems is it possible to have the file-daemon to monitor
 931           filesystem changes and initiate backup.
 932
 933   Why:    There are 2 situations where this is nice to have.
 934           1) It is possible to get a much finer-grained backup than
 935              the fixed schedules used now.. A file created and deleted
 936              a few hours later, can automatically be caught.
 937
 938           2) The introduced load on the system will probably be
 939              distributed more even on the system.
 940
 941   Notes:  This can be combined with configration that specifies
 942           something like: "at most every 15 minutes or when changes
 943           consumed XX MB".
 944
 945 Kern Notes: I would rather see this implemented by an external program
 946           that monitors the Filesystem changes, then uses the console
 947
 948
 949 Item h4:  Directive/mode to backup only file changes, not entire file
 950   Date:   11 November 2005
 951   Origin: Joshua Kugler <joshua dot kugler at uaf dot edu>
 952           Marek Bajon <mbajon at bimsplus dot com dot pl>
 953   Status:
 954
 955   What:   Currently when a file changes, the entire file will be backed up in
 956           the next incremental or full backup.  To save space on the tapes
 957           it would be nice to have a mode whereby only the changes to the
 958           file would be backed up when it is changed.
 959
 960   Why:    This would save lots of space when backing up large files such as
 961           logs, mbox files, Outlook PST files and the like.
 962
 963   Notes:  This would require the usage of disk-based volumes as comparing
 964           files would not be feasible using a tape drive.
 965
 966   Notes:  Kern: I don't know how to implement this. Put on hold until someone
 967           provides a detailed implementation plan.
 968
 969
 970 Item h5:  Implement multiple numeric backup levels as supported by dump
 971 Date:     3 April 2006
 972 Origin:   Daniel Rich <drich@employees.org>
 973 Status:
 974 What:     Dump allows specification of backup levels numerically instead of just
 975           "full", "incr", and "diff".  In this system, at any given level, all
 976           files are backed up that were were modified since the last backup of a
 977           higher level (with 0 being the highest and 9 being the lowest).  A
 978           level 0 is therefore equivalent to a full, level 9 an incremental, and
 979           the levels 1 through 8 are varying levels of differentials.  For
 980           bacula's sake, these could be represented as "full", "incr", and
 981           "diff1", "diff2", etc.
 982
 983 Why:      Support of multiple backup levels would provide for more advanced backup
 984           rotation schemes such as "Towers of Hanoi".  This would allow better
 985           flexibility in performing backups, and can lead to shorter recover
 986           times.
 987
 988 Notes:    Legato Networker supports a similar system with full, incr, and 1-9 as
 989           levels.
 990
 991 Notes:    Kern: I don't see the utility of this, and it would be a *huge*
 992           modification to existing code.
 993
 994 Item h6:  Implement NDMP protocol support
 995   Origin: Alan Davis
 996   Date:   06 March 2007
 997   Status:
 998
 999   What:   Network Data Management Protocol is implemented by a number of
1000           NAS filer vendors to enable backups using third-party
1001           software.
1002
1003   Why:    This would allow NAS filer backups in Bacula without incurring
1004           the overhead of NFS or SBM/CIFS.
1005
1006   Notes:  Further information is available:
1007           http://www.ndmp.org
1008           http://www.ndmp.org/wp/wp.shtml
1009           http://www.traakan.com/ndmjob/index.html
1010
1011           There are currently no viable open-source NDMP
1012           implementations.  There is a reference SDK and example
1013           app available from ndmp.org but it has problems
1014           compiling on recent Linux and Solaris OS'.  The ndmjob
1015           reference implementation from Traakan is known to
1016           compile on Solaris 10.
1017
1018   Notes:  Kern: I am not at all in favor of this until NDMP becomes
1019           an Open Standard or until there are Open Source libraries
1020           that interface to it.
1021
1022 Item h7:  Commercial database support
1023   Origin: Russell Howe <russell_howe dot wreckage dot org>
1024   Date:   26 July 2006
1025   Status:
1026
1027   What:   It would be nice for the database backend to support more
1028           databases. I'm thinking of SQL Server at the moment, but I guess Oracle,
1029           DB2, MaxDB, etc are all candidates. SQL Server would presumably be
1030           implemented using FreeTDS or maybe an ODBC library?
1031
1032   Why:    We only really have one database server, which is MS SQL Server
1033           2000. Maintaining a second one for the backup software (we grew out of
1034           SQLite, which I liked, but which didn't work so well with our database
1035           size). We don't really have a machine with the resources to run
1036           postgres, and would rather only maintain a single DBMS. We're stuck with
1037           SQL Server because pretty much all the company's custom applications
1038           (written by consultants) are locked into SQL Server 2000. I can imagine
1039           this scenario is fairly common, and it would be nice to use the existing
1040           properly specced database server for storing Bacula's catalog, rather
1041           than having to run a second DBMS.
1042
1043   Notes:  This might be nice, but someone other than me will probably need to
1044           implement it, and at the moment, proprietary code cannot legally be
1045           mixed with Bacula GPLed code.  This would be possible only providing
1046           the vendors provide GPLed (or OpenSource) interface code.
1047
1048 Item h8:  Incorporation of XACML2/SAML2 parsing
1049    Date:   19 January 2006
1050    Origin: Adam Thornton <athornton@sinenomine.net>
1051    Status: Blue sky
1052
1053    What:   XACML is "eXtensible Access Control Markup Language" and
1054           "SAML is the "Security Assertion Markup Language"--an XML standard
1055           for making statements about identity and authorization.  Having these
1056           would give us a framework to approach ACLs in a generic manner, and
1057           in a way flexible enough to support the four major sorts of ACLs I
1058           see as a concern to Bacula at this point, as well as (probably) to
1059           deal with new sorts of ACLs that may appear in the future.
1060
1061    Why:    Bacula is beginning to need to back up systems with ACLs
1062           that do not map cleanly onto traditional Unix permissions.  I see
1063           four sets of ACLs--in general, mutually incompatible with one
1064           another--that we're going to need to deal with.  These are: NTFS
1065           ACLs, POSIX ACLs, NFSv4 ACLS, and AFS ACLS.  (Some may question the
1066           relevance of AFS; AFS is one of Sine Nomine's core consulting
1067           businesses, and having a reputable file-level backup and restore
1068           technology for it (as Tivoli is probably going to drop AFS support
1069           soon since IBM no longer supports AFS) would be of huge benefit to
1070           our customers; we'd most likely create the AFS support at Sine Nomine
1071           for inclusion into the Bacula (and perhaps some changes to the
1072           OpenAFS volserver) core code.)
1073
1074           Now, obviously, Bacula already handles NTFS just fine.  However, I
1075           think there's a lot of value in implementing a generic ACL model, so
1076           that it's easy to support whatever particular instances of ACLs come
1077           down the pike: POSIX ACLS (think SELinux) and NFSv4 are the obvious
1078           things arriving in the Linux world in a big way in the near future.
1079           XACML, although overcomplicated for our needs, provides this
1080           framework, and we should be able to leverage other people's
1081           implementations to minimize the amount of work *we* have to do to get
1082           a generic ACL framework.  Basically, the costs of implementation are
1083           high, but they're largely both external to Bacula and already sunk.
1084
1085    Notes: As you indicate this is a bit of "blue sky" or in other words,
1086           at the moment, it is a bit esoteric to consider for Bacula.
1087
1088 Item h9:  Archive data
1089   Date:   15/5/2006
1090   Origin: calvin streeting calvin at absentdream dot com
1091   Status:
1092
1093   What:   The abilty to archive to media (dvd/cd) in a uncompressed format
1094           for dead filing (archiving not backing up)
1095
1096     Why:  At work when jobs are finished and moved off of the main file
1097           servers (raid based systems) onto a simple Linux file server (ide based
1098           system) so users can find old information without contacting the IT
1099           dept.
1100
1101           So this data dosn't realy change it only gets added to,
1102           But it also needs backing up.  At the moment it takes
1103           about 8 hours to back up our servers (working data) so
1104           rather than add more time to existing backups i am trying
1105           to implement a system where we backup the acrhive data to
1106           cd/dvd these disks would only need to be appended to
1107           (burn only new/changed files to new disks for off site
1108           storage).  basialy understand the differnce between
1109           achive data and live data.
1110
1111   Notes:  Scan the data and email me when it needs burning divide
1112           into predefined chunks keep a recored of what is on what
1113           disk make me a label (simple php->mysql=>pdf stuff) i
1114           could do this bit ability to save data uncompresed so
1115           it can be read in any other system (future proof data)
1116           save the catalog with the disk as some kind of menu
1117           system
1118
1119    Notes: Kern: I don't understand this item, and in any case, if it
1120           is specific to DVD/CDs, which we do not recommend using,
1121           it is unlikely to be implemented except as a user
1122           submitted patch.
1123
1124
1125 Item h10: Clustered file-daemons
1126   Origin: Alan Brown ajb2 at mssl dot ucl dot ac dot uk
1127   Date:   24 July 2006
1128   Status:
1129   What:   A "virtual" filedaemon, which is actually a cluster of real ones.
1130
1131   Why:    In the case of clustered filesystems (SAN setups, GFS, or OCFS2, etc)
1132           multiple machines may have access to the same set of filesystems
1133
1134           For performance reasons, one may wish to initate backups from
1135           several of these machines simultaneously, instead of just using
1136           one backup source for the common clustered filesystem.
1137
1138           For obvious reasons, normally backups of $A-FD/$PATH and
1139           B-FD/$PATH are treated as different backup sets. In this case
1140           they are the same communal set.
1141
1142           Likewise when restoring, it would be easier to just specify
1143           one of the cluster machines and let bacula decide which to use.
1144
1145           This can be faked to some extent using DNS round robin entries
1146           and a virtual IP address, however it means "status client" will
1147           always give bogus answers. Additionally there is no way of
1148           spreading the load evenly among the servers.
1149
1150           What is required is something similar to the storage daemon
1151           autochanger directives, so that Bacula can keep track of
1152           operating backups/restores and direct new jobs to a "free"
1153           client.
1154
1155    Notes: Kern: I don't understand the request enough to be able to
1156           implement it. A lot more design detail should be presented
1157           before voting on this project.
1158
1159
1160 ========== Already implemented ================================
1161
1162 Item  n:  make changing "spooldata=yes|no" possible for
1163           manual/interactive jobs
1164   Origin: Marc Schiffbauer <marc@schiffbauer.net>
1165   Date:   12 April 2007)
1166   Status: Already implemented by Eric
1167
1168   What:   Make it possible to modify the spooldata option
1169           for a job when being run from within the console.
1170           Currently it is possible to modify the backup level
1171           and the spooldata setting in a Schedule resource.
1172           It is also possible to modify the backup level when using
1173           the "run" command in the console.
1174           But it is currently not possible to to the same
1175           with "spooldata=yes|no" like:
1176
1177           run job=MyJob level=incremental spooldata=yes
1178
1179   Why:    In some situations it would be handy to be able to switch
1180           spooldata on or off for interactive/manual jobs based on
1181           which data the admin expects or how fast the LAN/WAN
1182           connection currently is.
1183
1184   Notes:  ./.
1185
1186 ============= Empty Feature Request form ===========
1187 Item  n:  One line summary ...
1188   Date:   Date submitted
1189   Origin: Name and email of originator.
1190   Status:
1191
1192   What:   More detailed explanation ...
1193
1194   Why:    Why it is important ...
1195
1196   Notes:  Additional notes or features (omit if not used)
1197 ============== End Feature Request form ==============