Projects: Bacula Projects Roadmap Prioritized by user vote 07 December 2005 Status updated 15 December 2006 Summary: Item 1: Implement data encryption (as opposed to comm encryption) Item 2: Implement Migration that moves Jobs from one Pool to another. Item 3: Accurate restoration of renamed/deleted files from Item 4: Implement a Bacula GUI/management tool using Python. Item 5: Implement Base jobs. Item 6: Allow FD to initiate a backup Item 7: Improve Bacula's tape and drive usage and cleaning management. Item 8: Implement creation and maintenance of copy pools Item 9: Implement new {Client}Run{Before|After}Job feature. Item 10: Merge multiple backups (Synthetic Backup or Consolidation). Item 11: Deletion of Disk-Based Bacula Volumes Item 12: Directive/mode to backup only file changes, not entire file Item 13: Multiple threads in file daemon for the same job Item 14: Implement red/black binary tree routines. Item 15: Add support for FileSets in user directories CACHEDIR.TAG Item 16: Implement extraction of Win32 BackupWrite data. Item 17: Implement a Python interface to the Bacula catalog. Item 18: Archival (removal) of User Files to Tape Item 19: Add Plug-ins to the FileSet Include statements. Item 20: Implement more Python events in Bacula. Item 21: Quick release of FD-SD connection after backup. Item 22: Permit multiple Media Types in an Autochanger Item 23: Allow different autochanger definitions for one autochanger. Item 24: Automatic disabling of devices Item 25: Implement huge exclude list support using hashing. Items complete and to be released in version 1.40.0: Item 1: Implement data encryption (as opposed to comm encryption) Item 2: Implement Migration that moves Jobs from one Pool to another. Item 9: Implement new {Client}Run{Before|After}Job feature. Item 16: Implement extraction of Win32 BackupWrite data. Items implemented but not tested and hence consequences are unknown: Item 22: Permit multiple Media Types in an Autochanger Below, you will find more information on future projects: Item 1: Implement data encryption (as opposed to comm encryption) Date: 28 October 2005 Origin: Sponsored by Landon and 13 contributors to EFF. Status: Done: Landon Fuller has implemented this in 1.39.x. What: Currently the data that is stored on the Volume is not encrypted. For confidentiality, encryption of data at the File daemon level is essential. Data encryption encrypts the data in the File daemon and decrypts the data in the File daemon during a restore. Why: Large sites require this. Item 2: Implement Migration that moves Jobs from one Pool to another. Origin: Sponsored by Riege Software International GmbH. Contact: Daniel Holtkamp Date: 28 October 2005 Status: Done. Completed in version 1.39.31 by Kern. What: The ability to copy, move, or archive data that is on a device to another device is very important. Why: An ISP might want to backup to disk, but after 30 days migrate the data to tape backup and delete it from disk. Bacula should be able to handle this automatically. It needs to know what was put where, and when, and what to migrate -- it is a bit like retention periods. Doing so would allow space to be freed up for current backups while maintaining older data on tape drives. Notes: Riege Software have asked for the following migration triggers: Age of Job Highwater mark (stopped by Lowwater mark?) Notes: Migration could be additionally triggered by: Number of Jobs Number of Volumes Item 3: Accurate restoration of renamed/deleted files from Incremental/Differential backups Date: 28 November 2005 Origin: Martin Simmons (martin at lispworks dot com) Status: What: When restoring a fileset for a specified date (including "most recent"), Bacula should give you exactly the files and directories that existed at the time of the last backup prior to that date. Currently this only works if the last backup was a Full backup. When the last backup was Incremental/Differential, files and directories that have been renamed or deleted since the last Full backup are not currently restored correctly. Ditto for files with extra/fewer hard links than at the time of the last Full backup. Why: Incremental/Differential would be much more useful if this worked. Notes: Item 14 (Merging of multiple backups into a single one) seems to rely on this working, otherwise the merged backups will not be truly equivalent to a Full backup. Kern: notes shortened. This can be done without the need for inodes. It is essentially the same as the current Verify job, but one additional database record must be written, which does not need any database change. Kern: see if we can correct restoration of directories if replace=ifnewer is set. Currently, if the directory does not exist, a "dummy" directory is created, then when all the files are updated, the dummy directory is newer so the real values are not updated. Item 4: Implement a Bacula GUI/management tool using Python. Origin: Kern Date: 28 October 2005 Status: Lucus is working on this for Python GTK+. What: Implement a Bacula console, and management tools using Python and Qt or GTK. Why: Don't we already have a wxWidgets GUI? Yes, but it is written in C++ and changes to the user interface must be hand tailored using C++ code. By developing the user interface using Qt designer, the interface can be very easily updated and most of the new Python code will be automatically created. The user interface changes become very simple, and only the new features must be implement. In addition, the code will be in Python, which will give many more users easy (or easier) access to making additions or modifications. Notes: This is currently being implemented using Python-GTK by Lucas Di Pentima Item 5: Implement Base jobs. Date: 28 October 2005 Origin: Kern Status: What: A base job is sort of like a Full save except that you will want the FileSet to contain only files that are unlikely to change in the future (i.e. a snapshot of most of your system after installing it). After the base job has been run, when you are doing a Full save, you specify one or more Base jobs to be used. All files that have been backed up in the Base job/jobs but not modified will then be excluded from the backup. During a restore, the Base jobs will be automatically pulled in where necessary. Why: This is something none of the competition does, as far as we know (except perhaps BackupPC, which is a Perl program that saves to disk only). It is big win for the user, it makes Bacula stand out as offering a unique optimization that immediately saves time and money. Basically, imagine that you have 100 nearly identical Windows or Linux machine containing the OS and user files. Now for the OS part, a Base job will be backed up once, and rather than making 100 copies of the OS, there will be only one. If one or more of the systems have some files updated, no problem, they will be automatically restored. Notes: Huge savings in tape usage even for a single machine. Will require more resources because the DIR must send FD a list of files/attribs, and the FD must search the list and compare it for each file to be saved. Item 6: Allow FD to initiate a backup Origin: Frank Volf (frank at deze dot org) Date: 17 November 2005 Status: What: Provide some means, possibly by a restricted console that allows a FD to initiate a backup, and that uses the connection established by the FD to the Director for the backup so that a Director that is firewalled can do the backup. Why: Makes backup of laptops much easier. Item 7: Improve Bacula's tape and drive usage and cleaning management. Date: 8 November 2005, November 11, 2005 Origin: Adam Thornton , Arno Lehmann Status: What: Make Bacula manage tape life cycle information, tape reuse times and drive cleaning cycles. Why: All three parts of this project are important when operating backups. We need to know which tapes need replacement, and we need to make sure the drives are cleaned when necessary. While many tape libraries and even autoloaders can handle all this automatically, support by Bacula can be helpful for smaller (older) libraries and single drives. Limiting the number of times a tape is used might prevent tape errors when using tapes until the drives can't read it any more. Also, checking drive status during operation can prevent some failures (as I [Arno] had to learn the hard way...) Notes: First, Bacula could (and even does, to some limited extent) record tape and drive usage. For tapes, the number of mounts, the amount of data, and the time the tape has actually been running could be recorded. Data fields for Read and Write time and Number of mounts already exist in the catalog (I'm not sure if VolBytes is the sum of all bytes ever written to that volume by Bacula). This information can be important when determining which media to replace. The ability to mark Volumes as "used up" after a given number of write cycles should also be implemented so that a tape is never actually worn out. For the tape drives known to Bacula, similar information is interesting to determine the device status and expected life time: Time it's been Reading and Writing, number of tape Loads / Unloads / Errors. This information is not yet recorded as far as I [Arno] know. A new volume status would be necessary for the new state, like "Used up" or "Worn out". Volumes with this state could be used for restores, but not for writing. These volumes should be migrated first (assuming migration is implemented) and, once they are no longer needed, could be moved to a Trash pool. The next step would be to implement a drive cleaning setup. Bacula already has knowledge about cleaning tapes. Once it has some information about cleaning cycles (measured in drive run time, number of tapes used, or calender days, for example) it can automatically execute tape cleaning (with an autochanger, obviously) or ask for operator assistance loading a cleaning tape. The final step would be to implement TAPEALERT checks not only when changing tapes and only sending the information to the administrator, but rather checking after each tape error, checking on a regular basis (for example after each tape file), and also before unloading and after loading a new tape. Then, depending on the drives TAPEALERT state and the known drive cleaning state Bacula could automatically schedule later cleaning, clean immediately, or inform the operator. Implementing this would perhaps require another catalog change and perhaps major changes in SD code and the DIR-SD protocol, so I'd only consider this worth implementing if it would actually be used or even needed by many people. Implementation of these projects could happen in three distinct sub-projects: Measuring Tape and Drive usage, retiring volumes, and handling drive cleaning and TAPEALERTs. Item 8: Implement creation and maintenance of copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. Why: In many cases, businesses are required to keep offsite copies of backup volumes, or just wish for simple protection against a human operator dropping a storage volume and damaging it. The ability to generate multiple volumes in the course of a single backup job allows customers to simple check out one copy and send it offsite, marking it as out of changer or otherwise unavailable. Currently, the library and magazine management capability in Bacula does not make this process simple. Restores would use the copy of the data on the first available volume, in order of copy pool chain definition. This is also a major scalability issue -- as the number of clients increases beyond several thousand, and the volume of data increases, transferring the data multiple times to produce additional copies of the backups will become physically impossible due to transfer speed issues. Generating multiple copies at server side will become the only practical option. How: I suspect that this will require adding a multiplexing SD that appears to be a SD to a specific FD, but 1-n FDs to the specific back end SDs managing the primary and copy pools. Storage pools will also need to acquire parameters to define the pools to be used for copies. Notes: I would commit some of my developers' time if we can agree on the design and behavior. Item 9: Implement new {Client}Run{Before|After}Job feature. Date: 26 September 2005 Origin: Phil Stracchino Status: Done. This has been implemented by Eric Bollengier What: Some time ago, there was a discussion of RunAfterJob and ClientRunAfterJob, and the fact that they do not run after failed jobs. At the time, there was a suggestion to add a RunAfterFailedJob directive (and, presumably, a matching ClientRunAfterFailedJob directive), but to my knowledge these were never implemented. The current implementation doesn't permit to add new feature easily. An alternate way of approaching the problem has just occurred to me. Suppose the RunBeforeJob and RunAfterJob directives were expanded in a manner like this example: RunScript { Command = "/opt/bacula/etc/checkhost %c" RunsOnClient = No # default AbortJobOnError = Yes # default RunsWhen = Before } RunScript { Command = c:/bacula/systemstate.bat RunsOnClient = yes AbortJobOnError = No RunsWhen = After RunsOnFailure = yes } RunScript { Command = c:/bacula/deletestatefile.bat Target = rico-fd RunsWhen = Always } It's now possible to specify more than 1 command per Job. (you can stop your database and your webserver without a script) ex : Job { Name = "Client1" JobDefs = "DefaultJob" Write Bootstrap = "/tmp/bacula/var/bacula/working/Client1.bsr" FileSet = "Minimal" RunBeforeJob = "echo test before ; echo test before2" RunBeforeJob = "echo test before (2nd time)" RunBeforeJob = "echo test before (3rd time)" RunAfterJob = "echo test after" ClientRunAfterJob = "echo test after client" RunScript { Command = "echo test RunScript in error" Runsonclient = yes RunsOnSuccess = no RunsOnFailure = yes RunsWhen = After # never by default } RunScript { Command = "echo test RunScript on success" Runsonclient = yes RunsOnSuccess = yes # default RunsOnFailure = no # default RunsWhen = After } } Why: It would be a significant change to the structure of the directives, but allows for a lot more flexibility, including RunAfter commands that will run regardless of whether the job succeeds, or RunBefore tasks that still allow the job to run even if that specific RunBefore fails. Notes: (More notes from Phil, Kern, David and Eric) I would prefer to have a single new Resource called RunScript. RunsWhen = After|Before|Always RunsAtJobLevels = All|Full|Diff|Inc # not yet implemented The AbortJobOnError, RunsOnSuccess and RunsOnFailure directives could be optional, and possibly RunWhen as well. AbortJobOnError would be ignored unless RunsWhen was set to Before and would default to Yes if omitted. If AbortJobOnError was set to No, failure of the script would still generate a warning. RunsOnSuccess would be ignored unless RunsWhen was set to After (or RunsBeforeJob set to No), and default to Yes. RunsOnFailure would be ignored unless RunsWhen was set to After, and default to No. Allow having the before/after status on the script command line so that the same script can be used both before/after. Item 10: Merge multiple backups (Synthetic Backup or Consolidation). Origin: Marc Cousin and Eric Bollengier Date: 15 November 2005 Status: Waiting implementation. Depends on first implementing project Item 2 (Migration). What: A merged backup is a backup made without connecting to the Client. It would be a Merge of existing backups into a single backup. In effect, it is like a restore but to the backup medium. For instance, say that last Sunday we made a full backup. Then all week long, we created incremental backups, in order to do them fast. Now comes Sunday again, and we need another full. The merged backup makes it possible to do instead an incremental backup (during the night for instance), and then create a merged backup during the day, by using the full and incrementals from the week. The merged backup will be exactly like a full made Sunday night on the tape, but the production interruption on the Client will be minimal, as the Client will only have to send incrementals. In fact, if it's done correctly, you could merge all the Incrementals into single Incremental, or all the Incrementals and the last Differential into a new Differential, or the Full, last differential and all the Incrementals into a new Full backup. And there is no need to involve the Client. Why: The benefit is that : - the Client just does an incremental ; - the merged backup on tape is just as a single full backup, and can be restored very fast. This is also a way of reducing the backup data since the old data can then be pruned (or not) from the catalog, possibly allowing older volumes to be recycled Item 11: Deletion of Disk-Based Bacula Volumes Date: Nov 25, 2005 Origin: Ross Boylan (edited by Kern) Status: What: Provide a way for Bacula to automatically remove Volumes from the filesystem, or optionally to truncate them. Obviously, the Volume must be pruned prior removal. Why: This would allow users more control over their Volumes and prevent disk based volumes from consuming too much space. Notes: The following two directives might do the trick: Volume Data Retention =