Projects:
                     Bacula Projects Roadmap 
                       22 November 2005

Below, you will find more information on future projects:

Item 1:   Implement a Migration job type that will move the job
          data from one device to another.
  Origin: Sponsored by Riege Sofware International GmbH. Contact:
          Daniel Holtkamp <holtkamp at riege dot com>
  Date:   28 October 2005
  Status: Partially coded in 1.37 -- much more to do. Assigned to
          Kern.

  What:   The ability to copy, move, or archive data that is on a
          device to another device is very important. 

  Why:    An ISP might want to backup to disk, but after 30 days
          migrate the data to tape backup and delete it from
          disk.  Bacula should be able to handle this
          automatically.  It needs to know what was put where,
          and when, and what to migrate -- it is a bit like
          retention periods.  Doing so would allow space to be
          freed up for current backups while maintaining older
          data on tape drives.

  Notes:  Migration could be triggered by:
           Number of Jobs
           Number of Volumes
           Age of Jobs
           Highwater size (keep total size)
           Lowwater mark

Item 2:   Implement extraction of Win32 BackupWrite data.
  Origin: Thorsten Engel <thorsten.engel at matrix-computer dot com>
  Date:   28 October 2005
  Status: Assigned to Thorsten. Implemented in current CVS

  What:   This provides the Bacula File daemon with code that
          can pick apart the stream output that Microsoft writes
          for BackupWrite data, and thus the data can be read
          and restored on non-Win32 machines.

  Why:    BackupWrite data is the portable=no option in Win32
          FileSets, and in previous Baculas, this data could
          only be extracted using a Win32 FD. With this new code,
          the Windows data can be extracted and restored on
          any OS.


Item 3:   Implement a Bacula GUI/management tool using Python
          and Qt.

  Origin: Kern
  Date:   28 October 2005
  Status: 

  What:   Implement a Bacula console, and management tools
          using Python and Qt.

  Why:    Don't we already have a wxWidgets GUI?  Yes, but
          it is written in C++ and changes to the user interface
          must be hand tailored using C++ code. By developing
          the user interface using Qt designer, the interface
          can be very easily updated and most of the new Python       
          code will be automatically created.  The user interface
          changes become very simple, and only the new features
          must be implement.  In addition, the code will be in
          Python, which will give many more users easy (or easier)
          access to making additions or modifications.

Item 4:   Implement a Python interface to the Bacula catalog.
  Date:   28 October 2005
  Origin: Kern
  Status: 

  What:   Implement an interface for Python scripts to access
          the catalog through Bacula.

  Why:    This will permit users to customize Bacula through
          Python scripts.

Item 5:   Implement more Python events in Bacula.
  Date:   28 October 2005
  Origin: 
  Status: 

  What:   Allow Python scripts to be called at more places 
          within Bacula and provide additional access to Bacula
          internal variables.

  Why:    This will permit users to customize Bacula through
          Python scripts.

  Notes:  Recycle event
          Scratch pool event
          NeedVolume event


Item 6:   Implement Base jobs.
  Date:   28 October 2005
  Origin: Kern
  Status: 
  
  What:   A base job is sort of like a Full save except that you 
          will want the FileSet to contain only files that are
          unlikely to change in the future (i.e.  a snapshot of
          most of your system after installing it).  After the
          base job has been run, when you are doing a Full save,
          you specify one or more Base jobs to be used.  All
          files that have been backed up in the Base job/jobs but
          not modified will then be excluded from the backup.
          During a restore, the Base jobs will be automatically
          pulled in where necessary.

  Why:    This is something none of the competition does, as far as
          we know (except perhpas BackupPC, which is a Perl program that
          saves to disk only).  It is big win for the user, it
          makes Bacula stand out as offering a unique
          optimization that immediately saves time and money.
          Basically, imagine that you have 100 nearly identical
          Windows or Linux machine containing the OS and user
          files.  Now for the OS part, a Base job will be backed
          up once, and rather than making 100 copies of the OS,
          there will be only one.  If one or more of the systems
          have some files updated, no problem, they will be
          automatically restored.

  Notes:  Huge savings in tape usage even for a single machine.
          Will require more resources because the DIR must send
          FD a list of files/attribs, and the FD must search the
          list and compare it for each file to be saved.

Item 7:   Add Plug-ins to the FileSet Include statements.
  Date:   28 October 2005
  Origin:
  Status: Partially coded in 1.37 -- much more to do.

  What:   Allow users to specify wild-card and/or regular
          expressions to be matched in both the Include and
          Exclude directives in a FileSet.  At the same time,
          allow users to define plug-ins to be called (based on
          regular expression/wild-card matching).

  Why:    This would give the users the ultimate ability to control
          how files are backed up/restored.  A user could write a
          plug-in knows how to backup his Oracle database without
          stopping/starting it, for example.

Item 8:   Implement huge exclude list support using hashing.
  Date:   28 October 2005
  Origin: Kern
  Status: 

  What:   Allow users to specify very large exclude list (currently
          more than about 1000 files is too many).

  Why:    This would give the users the ability to exclude all
          files that are loaded with the OS (e.g. using rpms
          or debs). If the user can restore the base OS from
          CDs, there is no need to backup all those files. A
          complete restore would be to restore the base OS, then
          do a Bacula restore. By excluding the base OS files, the
          backup set will be *much* smaller.


Item  9:  Implement data encryption (as opposed to communications
          encryption)
  Date:   28 October 2005
  Origin: Sponsored by Landon and 13 contributors to EFF.
  Status: Landon Fuller is currently implementing this.
                  
  What:   Currently the data that is stored on the Volume is not
          encrypted. For confidentiality, encryption of data at
          the File daemon level is essential. 
          Data encryption encrypts the data in the File daemon and
          decrypts the data in the File daemon during a restore.

  Why:    Large sites require this.

Item 10:  Permit multiple Media Types in an Autochanger
  Origin: 
  Status: 

  What:   Modify the Storage daemon so that multiple Media Types
          can be specified in an autochanger. This would be somewhat
          of a simplistic implementation in that each drive would
          still be allowed to have only one Media Type.  However,
          the Storage daemon will ensure that only a drive with
          the Media Type that matches what the Director specifies
          is chosen.

  Why:    This will permit user with several different drive types
          to make full use of their autochangers.

Item 11:  Allow two different autochanger definitions that refer
          to the same autochanger.
  Date:   28 October 2005
  Origin: Kern
  Status: 

  What:   Currently, the autochanger script is locked based on
          the autochanger. That is, if multiple drives are being
          simultaneously used, the Storage daemon ensures that only
          one drive at a time can access the mtx-changer script.
          This change would base the locking on the control device,
          rather than the autochanger. It would then permit two autochanger
          definitions for the same autochanger, but with different 
          drives. Logically, the autochanger could then be "partitioned"
          for different jobs, clients, or class of jobs, and if the locking
          is based on the control device (e.g. /dev/sg0) the mtx-changer
          script will be locked appropriately.

  Why:    This will permit users to partition autochangers for specific
          use. It would also permit implementation of multiple Media
          Types with no changes to the Storage daemon.

Item 12:  Implement red/black binary tree routines.
  Date:   28 October 2005
  Origin: Kern
  Status: 

  What:   Implement a red/black binary tree class. This could 
          then replace the current binary insert/search routines
          used in the restore in memory tree.  This could significantly
          speed up the creation of the in memory restore tree.

  Why:    Performance enhancement.

Item 13:  Let Bacula log tape usage and handle drive cleaning cycles.
  Date:   November 11, 2005
  Origin: Arno Lehmann <al at its-lehmann dot de>
  Status:

  What:   Make Bacula manage tape life cycle information and drive 
          cleaning cycles.

  Why:    Both parts of this project are important when operating backups.
          We need to know which tapes need replacement, and we need to
          make sure the drives are cleaned when necessary.  While many
          tape libraries and even autoloaders can handle all this
          automatically, support by Bacula can be helpful for smaller
          (older) libraries and single drives.  Also, checking drive
          status during operation can prevent some failures (as I had to
          learn the hard way...)

  Notes:  First, Bacula could (and even does, to some limited extent)
          record tape and drive usage.  For tapes, the number of mounts,
          the amount of data, and the time the tape has actually been
          running could be recorded.  Data fields for Read and Write time
          and Nmber of mounts already exist in the catalog (I'm not sure
          if VolBytes is the sum of all bytes ever written to that volume
          by Bacula).  This information can be important when determining
          which media to replace.  For the tape drives known to Bacula,
          similar information is interesting to determine the device
          status and expected life time: Time it's been Reading and
          Writing, number of tape Loads / Unloads / Errors.  This
          information is not yet recorded as far as I know.

          The next step would be implementing drive cleaning setup.
          Bacula already has knowledge about cleaning tapes.  Once it has
          some information about cleaning cycles (measured in drive run
          time, number of tapes used, or calender days, for example) it
          can automatically execute tape cleaning (with an autochanger,
          obviously) or ask for operator assistence loading a cleaning
          tape.

          The next step would be to implement TAPEALERT checks not only
          when changing tapes and only sending he information to the
          administrator, but rather checking after each tape error,
          checking on a regular basis (for example after each tape file),
          and also before unloading and after loading a new tape.  Then,
          depending on the drives TAPEALERT state and the know drive
          cleaning state Bacula could automatically schedule later
          cleaning, clean immediately, or inform the operator.

          Implementing this would perhaps require another catalog change
          and perhaps major changes in SD code and the DIR-SD protocoll,
          so I'd only consider this worth implementing if it would
          actually be used or even needed by many people. 

Item 14:  Merging of multiple backups into a single one. (Also called Synthetic
          Backup or Consolidation).

  Origin: Marc Cousin and Eric Bollengier 
  Date:   15 November 2005
  Status: Depends on first implementing project Item 1 (Migration).

  What:   A merged backup is a backup made without connecting to the Client.
          It would be a Merge of existing backups into a single backup.
          In effect, it is like a restore but to the backup medium.

          For instance, say that last sunday we made a full backup.  Then
          all week long, we created incremental backups, in order to do
          them fast.  Now comes sunday again, and we need another full.
          The merged backup makes it possible to do instead an incremental
          backup (during the night for instance), and then create a merged
          backup during the day, by using the full and incrementals from
          the week.  The merged backup will be exactly like a full made
          sunday night on the tape, but the production interruption on the
          Client will be minimal, as the Client will only have to send
          incrementals.

          In fact, if it's done correctly, you could merge all the
          Incrementals into single Incremental, or all the Incrementals
          and the last Differential into a new Differential, or the Full,
          last differential and all the Incrementals into a new Full
          backup.  And there is no need to involve the Client.

  Why:    The benefit is that :
          - the Client just does an incremental ;
          - the merged backup on tape is just as a single full backup,
            and can be restored very fast.

          This is also a way of reducing the backup data since the old
          data can then be pruned (or not) from the catalog, possibly
          allowing older volumes to be recycled

Item 15:  Automatic disabling of devices
   Date:   2005-11-11
   Origin: Peter Eriksson <peter at ifm.liu dot se>
   Status:

   What:  After a configurable amount of fatal errors with a tape drive
          Bacula should automatically disable further use of a certain
          tape drive. There should also be "disable"/"enable" commands in
          the "bconsole" tool.

   Why:   On a multi-drive jukebox there is a possibility of tape drives
          going bad during large backups (needing a cleaning tape run,
          tapes getting stuck). It would be advantageous if Bacula would
          automatically disable further use of a problematic tape drive
          after a configurable amount of errors has occured.

          An example: I have a multi-drive jukebox (6 drives, 380+ slots)
          where tapes occasionally get stuck inside the drive. Bacula will
          notice that the "mtx-changer" command will fail and then fail
          any backup jobs trying to use that drive. However, it will still
          keep on trying to run new jobs using that drive and fail -
          forever, and thus failing lots and lots of jobs... Since we have
          many drives Bacula could have just automatically disabled
          further use of that drive and used one of the other ones
          instead.


Item 16:  Directive/mode to backup only file changes, not entire file
  Date:   11 November 2005
  Origin: Joshua Kugler <joshua dot kugler at uaf dot edu>
          Marek Bajon <mbajon at bimsplus dot com dot pl>
  Status: RFC

  What:   Currently when a file changes, the entire file will be backed up in
          the next incremental or full backup.  To save space on the tapes
          it would be nice to have a mode whereby only the changes to the
          file would be backed up when it is changed.

  Why:    This would save lots of space when backing up large files such as 
          logs, mbox files, Outlook PST files and the like.

  Notes:  This would require the usage of disk-based volumes as comparing 
          files would not be feasible using a tape drive.

Item 17:  Quick release of FD-SD connection
  Origin: Frank Volf (frank at deze dot org)
  Date:   17 november 2005
  Status:

   What:  In the bacula implementation a backup is finished after all data
          and attributes are succesfully written to storage.  When using a
          tape backup it is very annoying that a backup can take a day,
          simply because the current tape (or whatever) is full and the
          administrator has not put a new one in.  During that time the
          system cannot be taken off-line, because there is still an open
          session between the storage daemon and the file daemon on the
          client.

          Although this is a very good strategey for making "safe backups"
          This can be annoying for e.g.  laptops, that must remain
          connected until the bacukp is completed.

          Using a new feature called "migration" it will be possible to
          spool first to harddisk (using a special 'spool' migration
          scheme) and then migrate the backup to tape.

          There is still the problem of getting the attributes committed.
          If it takes a very long time to do, with the current code, the
          job has not terminated, and the File daemon is not freed up.  The
          Storage daemon should release the File daemon as soon as all the
          file data and all the attributes have been sent to it (the SD).
          Currently the SD waits until everything is on tape and all the
          attributes are transmitted to the Director before signalling
          completion to the FD. I don't think I would have any problem
          changing this.  The reason is that even if the FD reports back to
          the Dir that all is OK, the job will not terminate until the SD
          has done the same thing -- so in a way keeping the SD-FD link
          open to the very end is not really very productive ...

   Why:   Makes backup of laptops much easier.


Item 18:  Add support for CACHEDIR.TAG
  Origin: Norbert Kiesel <nkiesel at tbdnetworks dot com>
  Date:   21 November 2005
  Status:

  What:   CACHDIR.TAG is a proposal for identifying directories which
          should be ignored for archiving/backup.  It works by ignoring
          directory trees which have a file named CACHEDIR.TAG with a
          specific content.  See
          http://www.brynosaurus.com/cachedir/spec.html
          for details.

          From Peter Eriksson:
          I suggest that if this is implemented (I've also asked for this
          feature some year ago) that it is made compatible with Legato
          Networkers ".nsr" files where you can specify a lot of options on
          how to handle files/directories (including denying further
          parsing of .nsr files lower down into the directory trees).  A
          PDF version of the .nsr man page can be viewed at:

          http://www.ifm.liu.se/~peter/nsr.pdf

  Why:    It's a nice alternative to "exclude" patterns for directories
          which don't have regular pathnames.  Also, it allows users to
          control backup for themself.  Implementation should be pretty
          simple.  GNU tar >= 1.14 or so supports it, too.

  Notes:  I envision this as an optional feature to a fileset
          specification.

Item 19:  Implement new {Client}Run{Before|After}Job feature.
  Date:   26 September 2005
  Origin: Phil Stracchino <phil.stracchino at speakeasy dot net>
  Status: 

  What:   Some time ago, there was a discussion of RunAfterJob and
          ClientRunAfterJob, and the fact that they do not run after failed
          jobs.  At the time, there was a suggestion to add a
          RunAfterFailedJob directive (and, presumably, a matching
          ClientRunAfterFailedJob directive), but to my knowledge these
          were never implemented.

          An alternate way of approaching the problem has just occurred to
          me.  Suppose the RunBeforeJob and RunAfterJob directives were
          expanded in a manner something like this example:

          RunBeforeJob {
              Command = "/opt/bacula/etc/checkhost %c"
              RunsOnClient = No
              RunsAtJobLevels = All       # All, Full, Diff, Inc
              AbortJobOnError = Yes
          }
          RunBeforeJob {
              Command = c:/bacula/systemstate.bat
              RunsOnClient = yes
              RunsAtJobLevels = All       # All, Full, Diff, Inc
              AbortJobOnError = No
          }

          RunAfterJob {
              Command = c:/bacula/deletestatefile.bat
              RunsOnClient = Yes
              RunsAtJobLevels = All       # All, Full, Diff, Inc
              RunsOnSuccess = Yes
              RunsOnFailure = Yes
          }
          RunAfterJob {
              Command = c:/bacula/somethingelse.bat
              RunsOnClient = Yes
              RunsAtJobLevels = All
              RunsOnSuccess = No
              RunsOnFailure = Yes
          }
          RunAfterJob {
              Command = "/opt/bacula/etc/checkhost -v %c"
              RunsOnClient = No
              RunsAtJobLevels = All
              RunsOnSuccess = No
              RunsOnFailure = Yes
          }


  Why:    It would be a significant change to the structure of the
          directives, but allows for a lot more flexibility, including
          RunAfter commands that will run regardless of whether the job
          succeeds, or RunBefore tasks that still allow the job to run even
          if that specific RunBefore fails.

  Notes:  By Kern: I would prefer to have a single new Resource called
          RunScript. More notes from Phil:

            RunsWhen = Before|After
            RunsAtJobLevels = All|Full|Diff|Inc

          The AbortJobOnError, RunsOnSuccess and RunsOnFailure directives
          could be optional, and possibly RunsWhen as well.

          If omitted, RunsWhen would default to Before.

          AbortJobOnError would be ignored unless RunsWhen was set to Before
          (or RunsBefore Job set to Yes), and would default to Yes if
          omitted.  If AbortJobOnError was set to No, failure of the script
          would still generate a warning.

          RunsOnSuccess would be ignored unless RunsWhen was set to After
          (or RunsBeforeJob set to No), and default to Yes.

          RunsOnFailure would be ignored unless RunsWhen was set to After,
          and default to No.


============= Empty RFC form ===========
Item n:   One line summary ...
  Date:   Date submitted 
  Origin: Name and email of originator.
  Status: 

  What:   More detailed explanation ...

  Why:    Why it is important ...

  Notes:  Additional notes or features (omit if not used)
============== End RFC form ==============


Items completed for release 1.38.0 -- see kernsdone