Projects:
                     Bacula Projects Roadmap 
                       07 December 2005
                    (prioritized by user vote)

Summary:
Item  1:  Implement data encryption (as opposed to comm encryption)
Item  2:  Implement Migration that moves Jobs from one Pool to another.
Item  3:  Accurate restoration of renamed/deleted files from
Item  4:  Implement a Bacula GUI/management tool using Python.
Item  5:  Implement Base jobs.
Item  6:  Allow FD to initiate a backup
Item  7:  Improve Bacula's tape and drive usage and cleaning management.
Item  8:  Implement creation and maintenance of copy pools
Item  9:  Implement new {Client}Run{Before|After}Job feature.
Item 10:  Merge multiple backups (Synthetic Backup or Consolidation).
Item 11:  Deletion of Disk-Based Bacula Volumes
Item 12:  Directive/mode to backup only file changes, not entire file
Item 13:  Multiple threads in file daemon for the same job
Item 14:  Implement red/black binary tree routines.
Item 15:  Add support for FileSets in user directories  CACHEDIR.TAG
Item 16:  Implement extraction of Win32 BackupWrite data.
Item 17:  Implement a Python interface to the Bacula catalog.
Item 18:  Archival (removal) of User Files to Tape
Item 19:  Add Plug-ins to the FileSet Include statements.
Item 20:  Implement more Python events in Bacula.
Item 21:  Quick release of FD-SD connection after backup.
Item 22:  Permit multiple Media Types in an Autochanger
Item 23:  Allow different autochanger definitions for one autochanger.
Item 24:  Automatic disabling of devices
Item 25:  Implement huge exclude list support using hashing.


Below, you will find more information on future projects:

Item  1:  Implement data encryption (as opposed to comm encryption)
  Date:   28 October 2005
  Origin: Sponsored by Landon and 13 contributors to EFF.
  Status: Landon Fuller is currently implementing this.
                  
  What:   Currently the data that is stored on the Volume is not
          encrypted. For confidentiality, encryption of data at
          the File daemon level is essential. 
          Data encryption encrypts the data in the File daemon and
          decrypts the data in the File daemon during a restore.

  Why:    Large sites require this.

Item 2:   Implement Migration that moves Jobs from one Pool to another.
  Origin: Sponsored by Riege Software International GmbH. Contact:
          Daniel Holtkamp 
  Date:   28 October 2005
  Status: Partially coded in 1.37 -- much more to do. Assigned to
          Kern.

  What:   The ability to copy, move, or archive data that is on a
          device to another device is very important. 

  Why:    An ISP might want to backup to disk, but after 30 days
          migrate the data to tape backup and delete it from
          disk.  Bacula should be able to handle this
          automatically.  It needs to know what was put where,
          and when, and what to migrate -- it is a bit like
          retention periods.  Doing so would allow space to be
          freed up for current backups while maintaining older
          data on tape drives.

  Notes:   Riege Software have asked for the following migration
           triggers:
           Age of Job
           Highwater mark (stopped by Lowwater mark?)
                            
  Notes:  Migration could be additionally triggered by:
           Number of Jobs
           Number of Volumes

Item  3:  Accurate restoration of renamed/deleted files from
          Incremental/Differential backups
  Date:   28 November 2005
  Origin: Martin Simmons (martin at lispworks dot com)
  Status:

  What:   When restoring a fileset for a specified date (including "most
          recent"), Bacula should give you exactly the files and directories
          that existed at the time of the last backup prior to that date.

          Currently this only works if the last backup was a Full backup.
          When the last backup was Incremental/Differential, files and
          directories that have been renamed or deleted since the last Full
          backup are not currently restored correctly.  Ditto for files with
          extra/fewer hard links than at the time of the last Full backup.

  Why:    Incremental/Differential would be much more useful if this worked.

  Notes:  Item 14 (Merging of multiple backups into a single one) seems to
          rely on this working, otherwise the merged backups will not be
          truly equivalent to a Full backup.  

          Kern: notes shortened. This can be done without the need for 
          inodes. It is essentially the same as the current Verify job,
          but one additional database record must be written, which does 
          not need any database change.

Item 4:   Implement a Bacula GUI/management tool using Python.
  Origin: Kern
  Date:   28 October 2005
  Status: 

  What:   Implement a Bacula console, and management tools
          using Python and Qt or GTK.

  Why:    Don't we already have a wxWidgets GUI?  Yes, but
          it is written in C++ and changes to the user interface
          must be hand tailored using C++ code. By developing
          the user interface using Qt designer, the interface
          can be very easily updated and most of the new Python       
          code will be automatically created.  The user interface
          changes become very simple, and only the new features
          must be implement.  In addition, the code will be in
          Python, which will give many more users easy (or easier)
          access to making additions or modifications.

 Notes:   This is currently being implemented using Python-GTK by       
          Lucas Di Pentima 

Item 5:   Implement Base jobs.
  Date:   28 October 2005
  Origin: Kern
  Status: 
  
  What:   A base job is sort of like a Full save except that you 
          will want the FileSet to contain only files that are
          unlikely to change in the future (i.e.  a snapshot of
          most of your system after installing it).  After the
          base job has been run, when you are doing a Full save,
          you specify one or more Base jobs to be used.  All
          files that have been backed up in the Base job/jobs but
          not modified will then be excluded from the backup.
          During a restore, the Base jobs will be automatically
          pulled in where necessary.

  Why:    This is something none of the competition does, as far as
          we know (except perhaps BackupPC, which is a Perl program that
          saves to disk only).  It is big win for the user, it
          makes Bacula stand out as offering a unique
          optimization that immediately saves time and money.
          Basically, imagine that you have 100 nearly identical
          Windows or Linux machine containing the OS and user
          files.  Now for the OS part, a Base job will be backed
          up once, and rather than making 100 copies of the OS,
          there will be only one.  If one or more of the systems
          have some files updated, no problem, they will be
          automatically restored.

  Notes:  Huge savings in tape usage even for a single machine.
          Will require more resources because the DIR must send
          FD a list of files/attribs, and the FD must search the
          list and compare it for each file to be saved.

Item  6:  Allow FD to initiate a backup
  Origin: Frank Volf (frank at deze dot org)
  Date:   17 November 2005
  Status:

   What:  Provide some means, possibly by a restricted console that
          allows a FD to initiate a backup, and that uses the connection
          established by the FD to the Director for the backup so that
          a Director that is firewalled can do the backup.

   Why:   Makes backup of laptops much easier.

Item  7:  Improve Bacula's tape and drive usage and cleaning management.
  Date:   8 November 2005, November 11, 2005
  Origin: Adam Thornton ,
          Arno Lehmann 
  Status:

  What:   Make Bacula manage tape life cycle information, tape reuse
          times and drive cleaning cycles.

  Why:    All three parts of this project are important when operating
          backups.
          We need to know which tapes need replacement, and we need to
          make sure the drives are cleaned when necessary.  While many
          tape libraries and even autoloaders can handle all this
          automatically, support by Bacula can be helpful for smaller
          (older) libraries and single drives.  Limiting the number of
          times a tape is used might prevent tape errors when using
          tapes until the drives can't read it any more.  Also, checking
          drive status during operation can prevent some failures (as I
          [Arno] had to learn the hard way...)

  Notes:  First, Bacula could (and even does, to some limited extent)
          record tape and drive usage.  For tapes, the number of mounts,
          the amount of data, and the time the tape has actually been
          running could be recorded.  Data fields for Read and Write
          time and Number of mounts already exist in the catalog (I'm
          not sure if VolBytes is the sum of all bytes ever written to
          that volume by Bacula).  This information can be important
          when determining which media to replace.  The ability to mark
          Volumes as "used up" after a given number of write cycles
          should also be implemented so that a tape is never actually
          worn out.  For the tape drives known to Bacula, similar
          information is interesting to determine the device status and
          expected life time: Time it's been Reading and Writing, number
          of tape Loads / Unloads / Errors.  This information is not yet
          recorded as far as I [Arno] know.  A new volume status would
          be necessary for the new state, like "Used up" or "Worn out".
          Volumes with this state could be used for restores, but not
          for writing. These volumes should be migrated first (assuming
          migration is implemented) and, once they are no longer needed,
          could be moved to a Trash pool.

          The next step would be to implement a drive cleaning setup.
          Bacula already has knowledge about cleaning tapes.  Once it
          has some information about cleaning cycles (measured in drive
          run time, number of tapes used, or calender days, for example)
          it can automatically execute tape cleaning (with an
          autochanger, obviously) or ask for operator assistance loading
          a cleaning tape.

          The final step would be to implement TAPEALERT checks not only
          when changing tapes and only sending the information to the
          administrator, but rather checking after each tape error,
          checking on a regular basis (for example after each tape
          file), and also before unloading and after loading a new tape.
          Then, depending on the drives TAPEALERT state and the known
          drive cleaning state Bacula could automatically schedule later
          cleaning, clean immediately, or inform the operator.

          Implementing this would perhaps require another catalog change
          and perhaps major changes in SD code and the DIR-SD protocol,
          so I'd only consider this worth implementing if it would
          actually be used or even needed by many people.

          Implementation of these projects could happen in three distinct
          sub-projects: Measuring Tape and Drive usage, retiring
          volumes, and handling drive cleaning and TAPEALERTs.

Item  8:  Implement creation and maintenance of copy pools
  Date:   27 November 2005
  Origin: David Boyes (dboyes at sinenomine dot net)
  Status:

  What:   I would like Bacula to have the capability to write copies
          of backed-up data on multiple physical volumes selected
          from different pools without transferring the data
          multiple times, and to accept any of the copy volumes
          as valid for restore.

  Why:    In many cases, businesses are required to keep offsite
          copies of backup volumes, or just wish for simple
          protection against a human operator dropping a storage
          volume and damaging it. The ability to generate multiple
          volumes in the course of a single backup job allows
          customers to simple check out one copy and send it
          offsite, marking it as out of changer or otherwise
          unavailable. Currently, the library and magazine
          management capability in Bacula does not make this process
          simple.

          Restores would use the copy of the data on the first
          available volume, in order of copy pool chain definition.

          This is also a major scalability issue -- as the number of
          clients increases beyond several thousand, and the volume
          of data increases, transferring the data multiple times to
          produce additional copies of the backups will become
          physically impossible due to transfer speed
          issues. Generating multiple copies at server side will
          become the only practical option. 

  How:    I suspect that this will require adding a multiplexing
          SD that appears to be a SD to a specific FD, but 1-n FDs
          to the specific back end SDs managing the primary and copy
          pools.  Storage pools will also need to acquire parameters
          to define the pools to be used for copies. 

  Notes:  I would commit some of my developers' time if we can agree
          on the design and behavior. 

Item  9:  Implement new {Client}Run{Before|After}Job feature.
  Date:   26 September 2005
  Origin: Phil Stracchino 
  Status: 

  What:   Some time ago, there was a discussion of RunAfterJob and
          ClientRunAfterJob, and the fact that they do not run after failed
          jobs.  At the time, there was a suggestion to add a
          RunAfterFailedJob directive (and, presumably, a matching
          ClientRunAfterFailedJob directive), but to my knowledge these
          were never implemented.

          An alternate way of approaching the problem has just occurred to
          me.  Suppose the RunBeforeJob and RunAfterJob directives were
          expanded in a manner something like this example:

          RunBeforeJob {
              Command = "/opt/bacula/etc/checkhost %c"
              RunsOnClient = No
              RunsAtJobLevels = All       # All, Full, Diff, Inc
              AbortJobOnError = Yes
          }
          RunBeforeJob {
              Command = c:/bacula/systemstate.bat
              RunsOnClient = yes
              RunsAtJobLevels = All       # All, Full, Diff, Inc
              AbortJobOnError = No
          }

          RunAfterJob {
              Command = c:/bacula/deletestatefile.bat
              RunsOnClient = Yes
              RunsAtJobLevels = All       # All, Full, Diff, Inc
              RunsOnSuccess = Yes
              RunsOnFailure = Yes
          }
          RunAfterJob {
              Command = c:/bacula/somethingelse.bat
              RunsOnClient = Yes
              RunsAtJobLevels = All
              RunsOnSuccess = No
              RunsOnFailure = Yes
          }
          RunAfterJob {
              Command = "/opt/bacula/etc/checkhost -v %c"
              RunsOnClient = No
              RunsAtJobLevels = All
              RunsOnSuccess = No
              RunsOnFailure = Yes
          }


  Why:    It would be a significant change to the structure of the
          directives, but allows for a lot more flexibility, including
          RunAfter commands that will run regardless of whether the job
          succeeds, or RunBefore tasks that still allow the job to run even
          if that specific RunBefore fails.

  Notes:  By Kern: I would prefer to have a single new Resource called
          RunScript. More notes from Phil:

            RunBeforeJob = yes|no
            RunAfterJob = yes|no
            RunsAtJobLevels = All|Full|Diff|Inc

          The AbortJobOnError, RunsOnSuccess and RunsOnFailure directives
          could be optional, and possibly RunsWhen as well.

          AbortJobOnError would be ignored unless RunsWhen was set to Before
          (or RunsBefore Job set to Yes), and would default to Yes if
          omitted.  If AbortJobOnError was set to No, failure of the script
          would still generate a warning.

          RunsOnSuccess would be ignored unless RunsWhen was set to After
          (or RunsBeforeJob set to No), and default to Yes.

          RunsOnFailure would be ignored unless RunsWhen was set to After,
          and default to No.

          Allow having the before/after status on the script command
          line so that the same script can be used both before/after.
          David Boyes.

Item 10:  Merge multiple backups (Synthetic Backup or Consolidation).
  Origin: Marc Cousin and Eric Bollengier 
  Date:   15 November 2005
  Status: Depends on first implementing project Item 1 (Migration).

  What:   A merged backup is a backup made without connecting to the Client.
          It would be a Merge of existing backups into a single backup.
          In effect, it is like a restore but to the backup medium.

          For instance, say that last Sunday we made a full backup.  Then
          all week long, we created incremental backups, in order to do
          them fast.  Now comes Sunday again, and we need another full.
          The merged backup makes it possible to do instead an incremental
          backup (during the night for instance), and then create a merged
          backup during the day, by using the full and incrementals from
          the week.  The merged backup will be exactly like a full made
          Sunday night on the tape, but the production interruption on the
          Client will be minimal, as the Client will only have to send
          incrementals.

          In fact, if it's done correctly, you could merge all the
          Incrementals into single Incremental, or all the Incrementals
          and the last Differential into a new Differential, or the Full,
          last differential and all the Incrementals into a new Full
          backup.  And there is no need to involve the Client.

  Why:    The benefit is that :
          - the Client just does an incremental ;
          - the merged backup on tape is just as a single full backup,
            and can be restored very fast.

          This is also a way of reducing the backup data since the old
          data can then be pruned (or not) from the catalog, possibly
          allowing older volumes to be recycled

Item 11:  Deletion of Disk-Based Bacula Volumes
  Date:   Nov 25, 2005
  Origin: Ross Boylan  (edited
          by Kern)
  Status:         

   What:  Provide a way for Bacula to automatically remove Volumes
          from the filesystem, or optionally to truncate them.
          Obviously, the Volume must be pruned prior removal.

  Why:    This would allow users more control over their Volumes and
          prevent disk based volumes from consuming too much space.

  Notes:  The following two directives might do the trick:

          Volume Data Retention =