X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fprojects;h=76c0e48a7647daf1df43c8cf21cc345601464427;hb=ad14d3946b4817451e48f990f8592d5eac9dca76;hp=8489863a0ffdca38d8d8c2c90e96226bc17690f4;hpb=1b347aae0cef9512378de98970930f18c3032cb7;p=bacula%2Fbacula diff --git a/bacula/projects b/bacula/projects index 8489863a0f..76c0e48a76 100644 --- a/bacula/projects +++ b/bacula/projects @@ -1,139 +1,1076 @@ Projects: - Bacula Projects Roadmap - 18 February 2004 + Bacula Projects Roadmap + Prioritized by user vote 07 December 2005 + Status updated 30 July 2006 -Completed items from last year's list: -Item 1: Multiple simultaneous Jobs. (done) -Item 3: Write the bscan program -- also write a bcopy program (done). -Item 5: Implement Label templates (done). -Item 6: Write a regression script (done) -Item 9: Add SSL to daemon communications (For now, implement with stunnel) -Item 10: Define definitive tape format (done) +Summary: +Item 1: Implement data encryption (as opposed to comm encryption) +Item 2: Implement Migration that moves Jobs from one Pool to another. +Item 3: Accurate restoration of renamed/deleted files from +Item 4: Implement a Bacula GUI/management tool using Python. +Item 5: Implement Base jobs. +Item 6: Allow FD to initiate a backup +Item 7: Improve Bacula's tape and drive usage and cleaning management. +Item 8: Implement creation and maintenance of copy pools +Item 9: Implement new {Client}Run{Before|After}Job feature. +Item 10: Merge multiple backups (Synthetic Backup or Consolidation). +Item 11: Deletion of Disk-Based Bacula Volumes +Item 12: Directive/mode to backup only file changes, not entire file +Item 13: Multiple threads in file daemon for the same job +Item 14: Implement red/black binary tree routines. +Item 15: Add support for FileSets in user directories CACHEDIR.TAG +Item 16: Implement extraction of Win32 BackupWrite data. +Item 17: Implement a Python interface to the Bacula catalog. +Item 18: Archival (removal) of User Files to Tape +Item 19: Add Plug-ins to the FileSet Include statements. +Item 20: Implement more Python events in Bacula. +Item 21: Quick release of FD-SD connection after backup. +Item 22: Permit multiple Media Types in an Autochanger +Item 23: Allow different autochanger definitions for one autochanger. +Item 24: Automatic disabling of devices +Item 25: Implement huge exclude list support using hashing. -Item 1: Implement Base jobs. - What: A base job is sort of like a Full save except that you - will want the FileSet to contain only files that are unlikely - to change in the future (i.e. a snapshot of most of your - system after installing it). After the base job has been run, - when you are doing a Full save, you can specify to exclude - all files saved by the base job that have not been modified. - - Why: This is something none of the competition does, as far as we know - (except BackupPC, which is a Perl program that saves to disk - only). It is big win for the user, it makes Bacula stand out - as offering a unique optimization that immediately saves time - and money. - - Notes: Big savings in tape usage. Will require more resources because - the DIR must send FD a list of files/attribs, and the FD must - search the list and compare it for each file to be saved. - -Item 2: Make the Storage daemon use intermediate file storage to buffer data - or Data Spooling. - - What: If data is coming into the SD too fast, buffer it to - disk if the user has configured this option, so that tape - shuttling or shoe-shine can be reduced. - - Why: This would be a nice project and is the most requested feature. - Even though you may finish a client job quicker by spilling to - disk, you still have to eventually get it onto tape. If - intermediate disk buffering allows us to improve write - bandwidth to tape, it may make sense. In addition, you can - run multiple simultaneous jobs all spool to disk, then the - data can be written one job at a time to the tape at full - tape speed. This keeps the tape running smoothly and prevents - blocks from different simultaneous jobs from being intermixed - on the tape, which is very ineffficient for restores. - - Notes: - -Item 3: GUI for interactive restore -Item 4: GUI for interactive backup - - What: The current interactive restore is implemented with a tty - interface. It would be much nicer to be able to "see" the - list of files backed up in typical GUI tree format. - The same mechanism could also be used for creating - ad-hoc backup FileSets (item 8). - - Why: Ease of use -- especially for the end user. - - Notes: Rather than implementing in Gtk, we probably should go directly - for a Browser implementation, even if doing so meant the - capability wouldn't be available until much later. Not only - is there the question of Windows sites, most - Solaris/HP/IRIX, etc, shops can't currently run Gtk programs - without installing lots of stuff admins are very wary about. - Real sysadmins will always use the command line anyway, and - the user who's doing an interactive restore or backup of his - own files will in most cases be on a Windows machine running - Exploder. - -Item 5: Implement data encryption (as opposed to communications - encryption) +Below, you will find more information on future projects: +Item 1: Implement data encryption (as opposed to comm encryption) + Date: 28 October 2005 + Origin: Sponsored by Landon and 13 contributors to EFF. + Status: Done: Landon Fuller has implemented this in 1.39.x. + What: Currently the data that is stored on the Volume is not encrypted. For confidentiality, encryption of data at - the File daemon level is essential. Note, communications - encryption encrypts the data when leaving the File daemon, - then decrypts the data on entry to the Storage daemon. + the File daemon level is essential. Data encryption encrypts the data in the File daemon and decrypts the data in the File daemon during a restore. Why: Large sites require this. - Notes: The only algorithm that is needed is AES. - http://csrc.nist.gov/CryptoToolkit/aes/ - -Item 6: Implement a Migration job type that will move the job - data from one device to another. +Item 2: Implement Migration that moves Jobs from one Pool to another. + Origin: Sponsored by Riege Software International GmbH. Contact: + Daniel Holtkamp + Date: 28 October 2005 + Status: 90% complete: Working in 1.39, more to do. Assigned to + Kern. What: The ability to copy, move, or archive data that is on a device to another device is very important. - Why: An ISP might want to backup to disk, but after 30 days - migrate the data to tape backup and delete it from disk. - Bacula should be able to handle this automatically. It needs to - know what was put where, and when, and what to migrate -- it - is a bit like retention periods. Doing so would allow space to - be freed up for current backups while maintaining older data on - tape drives. + Why: An ISP might want to backup to disk, but after 30 days + migrate the data to tape backup and delete it from + disk. Bacula should be able to handle this + automatically. It needs to know what was put where, + and when, and what to migrate -- it is a bit like + retention periods. Doing so would allow space to be + freed up for current backups while maintaining older + data on tape drives. - Notes: Migration could be triggered by: + Notes: Riege Software have asked for the following migration + triggers: + Age of Job + Highwater mark (stopped by Lowwater mark?) + + Notes: Migration could be additionally triggered by: Number of Jobs Number of Volumes - Age of Jobs - Highwater size (keep total size) - Lowwater mark - - -Item 7: New daemon communication protocol. - - What: The current daemon to daemon protocol is basically an ASCII - printf() and sending the buffer. On the receiving end, the - buffer is sscanf()ed to unpack it. The new scheme would - retain the current ASCII sending, but would add an - argc, argv like table driven scanner to replace sscanf. - - Why: Named fields will permit error checking to ensure that what is - sent is what the receiver really wants. The fields can be in - any order and additional fields can be ignored allowing better - upward compatibility. Much better checking of the types and - values passed can be done. - - Notes: These are internal improvements in the interest of the - long-term stability and evolution of the program. On the one - hand, the sooner they're done, the less code we have to rip - up when the time comes to install them. On the other hand, they - don't bring an immediately perceptible benefit to potential - users. - -To be documented: -Embedded Perl Scripting -Implement events -Multiple Storage devices for a single job -Write to more than one device simultaneously -Break the one-to-one relation between Storage and device + +Item 3: Accurate restoration of renamed/deleted files from + Incremental/Differential backups + Date: 28 November 2005 + Origin: Martin Simmons (martin at lispworks dot com) + Status: + + What: When restoring a fileset for a specified date (including "most + recent"), Bacula should give you exactly the files and directories + that existed at the time of the last backup prior to that date. + + Currently this only works if the last backup was a Full backup. + When the last backup was Incremental/Differential, files and + directories that have been renamed or deleted since the last Full + backup are not currently restored correctly. Ditto for files with + extra/fewer hard links than at the time of the last Full backup. + + Why: Incremental/Differential would be much more useful if this worked. + + Notes: Item 14 (Merging of multiple backups into a single one) seems to + rely on this working, otherwise the merged backups will not be + truly equivalent to a Full backup. + + Kern: notes shortened. This can be done without the need for + inodes. It is essentially the same as the current Verify job, + but one additional database record must be written, which does + not need any database change. + + Kern: see if we can correct restoration of directories if + replace=ifnewer is set. Currently, if the directory does not + exist, a "dummy" directory is created, then when all the files + are updated, the dummy directory is newer so the real values + are not updated. + +Item 4: Implement a Bacula GUI/management tool using Python. + Origin: Kern + Date: 28 October 2005 + Status: Lucus is working on this for Python GTK+. + + What: Implement a Bacula console, and management tools + using Python and Qt or GTK. + + Why: Don't we already have a wxWidgets GUI? Yes, but + it is written in C++ and changes to the user interface + must be hand tailored using C++ code. By developing + the user interface using Qt designer, the interface + can be very easily updated and most of the new Python + code will be automatically created. The user interface + changes become very simple, and only the new features + must be implement. In addition, the code will be in + Python, which will give many more users easy (or easier) + access to making additions or modifications. + + Notes: This is currently being implemented using Python-GTK by + Lucas Di Pentima + +Item 5: Implement Base jobs. + Date: 28 October 2005 + Origin: Kern + Status: + + What: A base job is sort of like a Full save except that you + will want the FileSet to contain only files that are + unlikely to change in the future (i.e. a snapshot of + most of your system after installing it). After the + base job has been run, when you are doing a Full save, + you specify one or more Base jobs to be used. All + files that have been backed up in the Base job/jobs but + not modified will then be excluded from the backup. + During a restore, the Base jobs will be automatically + pulled in where necessary. + + Why: This is something none of the competition does, as far as + we know (except perhaps BackupPC, which is a Perl program that + saves to disk only). It is big win for the user, it + makes Bacula stand out as offering a unique + optimization that immediately saves time and money. + Basically, imagine that you have 100 nearly identical + Windows or Linux machine containing the OS and user + files. Now for the OS part, a Base job will be backed + up once, and rather than making 100 copies of the OS, + there will be only one. If one or more of the systems + have some files updated, no problem, they will be + automatically restored. + + Notes: Huge savings in tape usage even for a single machine. + Will require more resources because the DIR must send + FD a list of files/attribs, and the FD must search the + list and compare it for each file to be saved. + +Item 6: Allow FD to initiate a backup + Origin: Frank Volf (frank at deze dot org) + Date: 17 November 2005 + Status: + + What: Provide some means, possibly by a restricted console that + allows a FD to initiate a backup, and that uses the connection + established by the FD to the Director for the backup so that + a Director that is firewalled can do the backup. + + Why: Makes backup of laptops much easier. + +Item 7: Improve Bacula's tape and drive usage and cleaning management. + Date: 8 November 2005, November 11, 2005 + Origin: Adam Thornton , + Arno Lehmann + Status: + + What: Make Bacula manage tape life cycle information, tape reuse + times and drive cleaning cycles. + + Why: All three parts of this project are important when operating + backups. + We need to know which tapes need replacement, and we need to + make sure the drives are cleaned when necessary. While many + tape libraries and even autoloaders can handle all this + automatically, support by Bacula can be helpful for smaller + (older) libraries and single drives. Limiting the number of + times a tape is used might prevent tape errors when using + tapes until the drives can't read it any more. Also, checking + drive status during operation can prevent some failures (as I + [Arno] had to learn the hard way...) + + Notes: First, Bacula could (and even does, to some limited extent) + record tape and drive usage. For tapes, the number of mounts, + the amount of data, and the time the tape has actually been + running could be recorded. Data fields for Read and Write + time and Number of mounts already exist in the catalog (I'm + not sure if VolBytes is the sum of all bytes ever written to + that volume by Bacula). This information can be important + when determining which media to replace. The ability to mark + Volumes as "used up" after a given number of write cycles + should also be implemented so that a tape is never actually + worn out. For the tape drives known to Bacula, similar + information is interesting to determine the device status and + expected life time: Time it's been Reading and Writing, number + of tape Loads / Unloads / Errors. This information is not yet + recorded as far as I [Arno] know. A new volume status would + be necessary for the new state, like "Used up" or "Worn out". + Volumes with this state could be used for restores, but not + for writing. These volumes should be migrated first (assuming + migration is implemented) and, once they are no longer needed, + could be moved to a Trash pool. + + The next step would be to implement a drive cleaning setup. + Bacula already has knowledge about cleaning tapes. Once it + has some information about cleaning cycles (measured in drive + run time, number of tapes used, or calender days, for example) + it can automatically execute tape cleaning (with an + autochanger, obviously) or ask for operator assistance loading + a cleaning tape. + + The final step would be to implement TAPEALERT checks not only + when changing tapes and only sending the information to the + administrator, but rather checking after each tape error, + checking on a regular basis (for example after each tape + file), and also before unloading and after loading a new tape. + Then, depending on the drives TAPEALERT state and the known + drive cleaning state Bacula could automatically schedule later + cleaning, clean immediately, or inform the operator. + + Implementing this would perhaps require another catalog change + and perhaps major changes in SD code and the DIR-SD protocol, + so I'd only consider this worth implementing if it would + actually be used or even needed by many people. + + Implementation of these projects could happen in three distinct + sub-projects: Measuring Tape and Drive usage, retiring + volumes, and handling drive cleaning and TAPEALERTs. + +Item 8: Implement creation and maintenance of copy pools + Date: 27 November 2005 + Origin: David Boyes (dboyes at sinenomine dot net) + Status: + + What: I would like Bacula to have the capability to write copies + of backed-up data on multiple physical volumes selected + from different pools without transferring the data + multiple times, and to accept any of the copy volumes + as valid for restore. + + Why: In many cases, businesses are required to keep offsite + copies of backup volumes, or just wish for simple + protection against a human operator dropping a storage + volume and damaging it. The ability to generate multiple + volumes in the course of a single backup job allows + customers to simple check out one copy and send it + offsite, marking it as out of changer or otherwise + unavailable. Currently, the library and magazine + management capability in Bacula does not make this process + simple. + + Restores would use the copy of the data on the first + available volume, in order of copy pool chain definition. + + This is also a major scalability issue -- as the number of + clients increases beyond several thousand, and the volume + of data increases, transferring the data multiple times to + produce additional copies of the backups will become + physically impossible due to transfer speed + issues. Generating multiple copies at server side will + become the only practical option. + + How: I suspect that this will require adding a multiplexing + SD that appears to be a SD to a specific FD, but 1-n FDs + to the specific back end SDs managing the primary and copy + pools. Storage pools will also need to acquire parameters + to define the pools to be used for copies. + + Notes: I would commit some of my developers' time if we can agree + on the design and behavior. + +Item 9: Implement new {Client}Run{Before|After}Job feature. + Date: 26 September 2005 + Origin: Phil Stracchino + Status: Done. This has been implemented by Eric Bollengier + + What: Some time ago, there was a discussion of RunAfterJob and + ClientRunAfterJob, and the fact that they do not run after failed + jobs. At the time, there was a suggestion to add a + RunAfterFailedJob directive (and, presumably, a matching + ClientRunAfterFailedJob directive), but to my knowledge these + were never implemented. + + The current implementation doesn't permit to add new feature easily. + + An alternate way of approaching the problem has just occurred to + me. Suppose the RunBeforeJob and RunAfterJob directives were + expanded in a manner like this example: + + RunScript { + Command = "/opt/bacula/etc/checkhost %c" + RunsOnClient = No # default + AbortJobOnError = Yes # default + RunsWhen = Before + } + RunScript { + Command = c:/bacula/systemstate.bat + RunsOnClient = yes + AbortJobOnError = No + RunsWhen = After + RunsOnFailure = yes + } + + RunScript { + Command = c:/bacula/deletestatefile.bat + Target = rico-fd + RunsWhen = Always + } + + It's now possible to specify more than 1 command per Job. + (you can stop your database and your webserver without a script) + + ex : + Job { + Name = "Client1" + JobDefs = "DefaultJob" + Write Bootstrap = "/tmp/bacula/var/bacula/working/Client1.bsr" + FileSet = "Minimal" + + RunBeforeJob = "echo test before ; echo test before2" + RunBeforeJob = "echo test before (2nd time)" + RunBeforeJob = "echo test before (3rd time)" + RunAfterJob = "echo test after" + ClientRunAfterJob = "echo test after client" + + RunScript { + Command = "echo test RunScript in error" + Runsonclient = yes + RunsOnSuccess = no + RunsOnFailure = yes + RunsWhen = After # never by default + } + RunScript { + Command = "echo test RunScript on success" + Runsonclient = yes + RunsOnSuccess = yes # default + RunsOnFailure = no # default + RunsWhen = After + } + } + + Why: It would be a significant change to the structure of the + directives, but allows for a lot more flexibility, including + RunAfter commands that will run regardless of whether the job + succeeds, or RunBefore tasks that still allow the job to run even + if that specific RunBefore fails. + + Notes: (More notes from Phil, Kern, David and Eric) + I would prefer to have a single new Resource called + RunScript. + + RunsWhen = After|Before|Always + RunsAtJobLevels = All|Full|Diff|Inc # not yet implemented + + The AbortJobOnError, RunsOnSuccess and RunsOnFailure directives + could be optional, and possibly RunWhen as well. + + AbortJobOnError would be ignored unless RunsWhen was set to Before + and would default to Yes if omitted. + If AbortJobOnError was set to No, failure of the script + would still generate a warning. + + RunsOnSuccess would be ignored unless RunsWhen was set to After + (or RunsBeforeJob set to No), and default to Yes. + + RunsOnFailure would be ignored unless RunsWhen was set to After, + and default to No. + + Allow having the before/after status on the script command + line so that the same script can be used both before/after. + +Item 10: Merge multiple backups (Synthetic Backup or Consolidation). + Origin: Marc Cousin and Eric Bollengier + Date: 15 November 2005 + Status: Waiting implementation. Depends on first implementing + project Item 2 (Migration). + + What: A merged backup is a backup made without connecting to the Client. + It would be a Merge of existing backups into a single backup. + In effect, it is like a restore but to the backup medium. + + For instance, say that last Sunday we made a full backup. Then + all week long, we created incremental backups, in order to do + them fast. Now comes Sunday again, and we need another full. + The merged backup makes it possible to do instead an incremental + backup (during the night for instance), and then create a merged + backup during the day, by using the full and incrementals from + the week. The merged backup will be exactly like a full made + Sunday night on the tape, but the production interruption on the + Client will be minimal, as the Client will only have to send + incrementals. + + In fact, if it's done correctly, you could merge all the + Incrementals into single Incremental, or all the Incrementals + and the last Differential into a new Differential, or the Full, + last differential and all the Incrementals into a new Full + backup. And there is no need to involve the Client. + + Why: The benefit is that : + - the Client just does an incremental ; + - the merged backup on tape is just as a single full backup, + and can be restored very fast. + + This is also a way of reducing the backup data since the old + data can then be pruned (or not) from the catalog, possibly + allowing older volumes to be recycled + +Item 11: Deletion of Disk-Based Bacula Volumes + Date: Nov 25, 2005 + Origin: Ross Boylan (edited + by Kern) + Status: + + What: Provide a way for Bacula to automatically remove Volumes + from the filesystem, or optionally to truncate them. + Obviously, the Volume must be pruned prior removal. + + Why: This would allow users more control over their Volumes and + prevent disk based volumes from consuming too much space. + + Notes: The following two directives might do the trick: + + Volume Data Retention =