X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fprojects;h=37cc6dcd883fde7684c8a3f0edbf5dfcdc6f84b9;hb=9fc6ed6276756410e36f931afd791798231b51bd;hp=32d851f7d6c2bc6be7c177e8d397cfa417440d4e;hpb=d4932979f453796f2f51f26d42c364bd54ab6360;p=bacula%2Fbacula diff --git a/bacula/projects b/bacula/projects index 32d851f7d6..37cc6dcd88 100644 --- a/bacula/projects +++ b/bacula/projects @@ -1,617 +1,827 @@ Projects: Bacula Projects Roadmap - 29 November 2005 + Status updated 8 August 2010 Summary: -Item 1: Implement Migration that moves Jobs from one Pool to another. -Item 2: Implement extraction of Win32 BackupWrite data. -Item 3: Implement a Bacula GUI/management tool using Python. -Item 4: Implement a Python interface to the Bacula catalog. -Item 5: Implement more Python events in Bacula. -Item 6: Implement Base jobs. -Item 7: Add Plug-ins to the FileSet Include statements. -Item 8: Implement huge exclude list support using hashing. -Item 9: Implement data encryption (as opposed to comm encryption) -Item 10: Permit multiple Media Types in an Autochanger -Item 11: Allow different autochanger definitions for one autochanger. -Item 12: Implement red/black binary tree routines. -Item 13: Improve Baculas tape and drive usage and cleaning management. -Item 14: Merge multiple backups (Synthetic Backup or Consolidation). -Item 15: Automatic disabling of devices -Item 16: Directive/mode to backup only file changes, not entire file -Item 17: Quick release of FD-SD connection after backup. -Item 18: Add support for FileSets in user directories CACHEDIR.TAG -Item 19: Implement new {Client}Run{Before|After}Job feature. -Item 20: Allow FD to initiate a backup -Item 21: Multiple threads in file daemon for the same job -Item 22: Archival (removal) of User Files to Tape -Item 23: Deletion of Disk-Based BaculaVolumes -Item 24: Accurate restoration of renamed/deleted files from -Item 25: Implement creation and maintenance of copy pools - - -Below, you will find more information on future projects: - -Item 1: Implement Migration that moves Jobs from one Pool to another. - Origin: Sponsored by Riege Software International GmbH. Contact: - Daniel Holtkamp - Date: 28 October 2005 - Status: Partially coded in 1.37 -- much more to do. Assigned to - Kern. - - What: The ability to copy, move, or archive data that is on a - device to another device is very important. - - Why: An ISP might want to backup to disk, but after 30 days - migrate the data to tape backup and delete it from - disk. Bacula should be able to handle this - automatically. It needs to know what was put where, - and when, and what to migrate -- it is a bit like - retention periods. Doing so would allow space to be - freed up for current backups while maintaining older - data on tape drives. - - Notes: Riege Software have asked for the following migration - triggers: - Age of Job - Highwater mark (stopped by Lowwater mark?) - - Notes: Migration could be additionally triggered by: - Number of Jobs - Number of Volumes - - -Item 2: Implement extraction of Win32 BackupWrite data. - Origin: Thorsten Engel - Date: 28 October 2005 - Status: Assigned to Thorsten. Implemented in current CVS - - What: This provides the Bacula File daemon with code that - can pick apart the stream output that Microsoft writes - for BackupWrite data, and thus the data can be read - and restored on non-Win32 machines. - - Why: BackupWrite data is the portable=no option in Win32 - FileSets, and in previous Baculas, this data could - only be extracted using a Win32 FD. With this new code, - the Windows data can be extracted and restored on - any OS. - - -Item 3: Implement a Bacula GUI/management tool using Python. - Origin: Kern - Date: 28 October 2005 - Status: +* => item complete + +Item 1: Ability to restart failed jobs +Item 2: SD redesign +Item 3: NDMP backup/restore +Item 4: SAP backup/restore +Item 5: Oracle backup/restore +Item 6: Zimbra and Zarafa backup/restore +Item* 7: Include timestamp of job launch in "stat clients" output +Item 8: Include all conf files in specified directory +Item 9: Reduction of communications bandwidth for a backup +Item 10: Concurrent spooling and despooling within a single job. +Item 11: Start spooling even when waiting on tape +Item 12: Add ability to Verify any specified Job. +Item 13: Data encryption on storage daemon +Item 14: Possibilty to schedule Jobs on last Friday of the month +Item 15: Scheduling syntax that permits more flexibility and options +Item 16: Ability to defer Batch Insert to a later time +Item 17: Add MaxVolumeSize/MaxVolumeBytes to Storage resource +Item 18: Message mailing based on backup types +Item 19: Handle Windows Encrypted Files using Win raw encryption +Item 20: Job migration between different SDs +Item 19. Allow FD to initiate a backup +Item 21: Implement Storage daemon compression +Item 22: Ability to import/export Bacula database entities +Item 23: Implementation of running Job speed limit. +Item 24: Add an override in Schedule for Pools based on backup types +Item 25: Automatic promotion of backup levels based on backup size +Item 26: Allow FileSet inclusion/exclusion by creation/mod times +Item 27: Archival (removal) of User Files to Tape +Item 28: Ability to reconnect a disconnected comm line +Item 29: Multiple threads in file daemon for the same job +Item 30: Automatic disabling of devices +Item 31: Enable persistent naming/number of SQL queries +Item 32: Bacula Dir, FD and SD to support proxies +Item 33: Add Minumum Spool Size directive +Item 34: Command that releases all drives in an autochanger +Item 35: Run bscan on a remote storage daemon from within bconsole. +Item 36: Implement a Migration job type that will create a reverse +Item 37: Extend the verify code to make it possible to verify +Item 38: Separate "Storage" and "Device" in the bacula-dir.conf +Item 39: Least recently used device selection for tape drives in autochanger. +Item 40: Implement a Storage device like Amazon's S3. +Item 41: Convert tray monitor on Windows to a stand alone program +Item 42: Improve Bacula's tape and drive usage and cleaning management +Item 43: Relabel disk volume after recycling + + +Item 1: Ability to restart failed jobs + Date: 26 April 2009 + Origin: Kern/Eric + Status: + + What: Often jobs fail because of a communications line drop or max run time, + cancel, or some other non-critical problem. Currrently any data + saved is lost. This implementation should modify the Storage daemon + so that it saves all the files that it knows are completely backed + up to the Volume + + The jobs should then be marked as incomplete and a subsequent + Incremental Accurate backup will then take into account all the + previously saved job. + + Why: Avoids backuping data already saved. + + Notes: Requires Accurate to restart correctly. Must completed have a minimum + volume of data or files stored on Volume before enabling. + +Item 2: SD redesign + Date: 8 August 2010 + Origin: Kern + Status: + + What: Various ideas for redesigns planned for the SD: + 1. One thread per drive + 2. Design a class structure for all objects in the SD. + 3. Make Device into C++ classes for each device type + 4. Make Device have a proxy (front end intercept class) that will permit control over locking and changing the real device pointer. It can also permit delaying opening, so that we can adapt to having another program that tells us the Archive device name. + 5. Allow plugins to create new on the fly devices + 6. Separate SD volume manager + 7. Volume manager tells Bacula what drive or device to use for a given volume + + Why: It will simplify the SD, make it more modular, reduce locking + conflicts, and allow multiple buffer backups. - What: Implement a Bacula console, and management tools - using Python and Qt or GTK. - Why: Don't we already have a wxWidgets GUI? Yes, but - it is written in C++ and changes to the user interface - must be hand tailored using C++ code. By developing - the user interface using Qt designer, the interface - can be very easily updated and most of the new Python - code will be automatically created. The user interface - changes become very simple, and only the new features - must be implement. In addition, the code will be in - Python, which will give many more users easy (or easier) - access to making additions or modifications. +Item 3: NDMP backup/restore + Date: 8 August 2010 + Origin: Bacula Systems + Status: Enterprise only if implemented by Bacula Systems - Notes: This is currently being implemented using Python-GTK by - Lucas Di Pentima + What: Backup/restore via NDMP -- most important NetApp compatibility -Item 4: Implement a Python interface to the Bacula catalog. - Date: 28 October 2005 - Origin: Kern - Status: - What: Implement an interface for Python scripts to access - the catalog through Bacula. +Item 4: SAP backup/restore + Date: 8 August 2010 + Origin: Bacula Systems + Status: Enterprise only if implemented by Bacula Systems - Why: This will permit users to customize Bacula through - Python scripts. + What: Backup/restore SAP databases (MaxDB, Oracle, possibly DB2) -Item 5: Implement more Python events in Bacula. - Date: 28 October 2005 - Origin: - Status: - What: Allow Python scripts to be called at more places - within Bacula and provide additional access to Bacula - internal variables. - Why: This will permit users to customize Bacula through - Python scripts. +Item 5: Oracle backup/restore + Date: 8 August 2010 + Origin: Bacula Systems + Status: Enterprise only if implemented by Bacula Systems - Notes: Recycle event - Scratch pool event - NeedVolume event - MediaFull event - - Also add a way to get a listing of currently running - jobs (possibly also scheduled jobs). + What: Backup/restore Oracle databases -Item 6: Implement Base jobs. - Date: 28 October 2005 - Origin: Kern - Status: - - What: A base job is sort of like a Full save except that you - will want the FileSet to contain only files that are - unlikely to change in the future (i.e. a snapshot of - most of your system after installing it). After the - base job has been run, when you are doing a Full save, - you specify one or more Base jobs to be used. All - files that have been backed up in the Base job/jobs but - not modified will then be excluded from the backup. - During a restore, the Base jobs will be automatically - pulled in where necessary. - - Why: This is something none of the competition does, as far as - we know (except perhaps BackupPC, which is a Perl program that - saves to disk only). It is big win for the user, it - makes Bacula stand out as offering a unique - optimization that immediately saves time and money. - Basically, imagine that you have 100 nearly identical - Windows or Linux machine containing the OS and user - files. Now for the OS part, a Base job will be backed - up once, and rather than making 100 copies of the OS, - there will be only one. If one or more of the systems - have some files updated, no problem, they will be - automatically restored. - - Notes: Huge savings in tape usage even for a single machine. - Will require more resources because the DIR must send - FD a list of files/attribs, and the FD must search the - list and compare it for each file to be saved. - -Item 7: Add Plug-ins to the FileSet Include statements. - Date: 28 October 2005 - Origin: - Status: Partially coded in 1.37 -- much more to do. - - What: Allow users to specify wild-card and/or regular - expressions to be matched in both the Include and - Exclude directives in a FileSet. At the same time, - allow users to define plug-ins to be called (based on - regular expression/wild-card matching). - - Why: This would give the users the ultimate ability to control - how files are backed up/restored. A user could write a - plug-in knows how to backup his Oracle database without - stopping/starting it, for example. - -Item 8: Implement huge exclude list support using hashing. - Date: 28 October 2005 - Origin: Kern - Status: +Item 6: Zimbra and Zarafa backup/restore + Date: 8 August 2010 + Origin: Bacula Systems + Status: Enterprise only if implemented by Bacula Systems - What: Allow users to specify very large exclude list (currently - more than about 1000 files is too many). - - Why: This would give the users the ability to exclude all - files that are loaded with the OS (e.g. using rpms - or debs). If the user can restore the base OS from - CDs, there is no need to backup all those files. A - complete restore would be to restore the base OS, then - do a Bacula restore. By excluding the base OS files, the - backup set will be *much* smaller. - - -Item 9: Implement data encryption (as opposed to comm encryption) - Date: 28 October 2005 - Origin: Sponsored by Landon and 13 contributors to EFF. - Status: Landon Fuller is currently implementing this. - - What: Currently the data that is stored on the Volume is not - encrypted. For confidentiality, encryption of data at - the File daemon level is essential. - Data encryption encrypts the data in the File daemon and - decrypts the data in the File daemon during a restore. - - Why: Large sites require this. - -Item 10: Permit multiple Media Types in an Autochanger - Origin: Kern - Status: Now implemented - - What: Modify the Storage daemon so that multiple Media Types - can be specified in an autochanger. This would be somewhat - of a simplistic implementation in that each drive would - still be allowed to have only one Media Type. However, - the Storage daemon will ensure that only a drive with - the Media Type that matches what the Director specifies - is chosen. - - Why: This will permit user with several different drive types - to make full use of their autochangers. - -Item 11: Allow different autochanger definitions for one autochanger. - Date: 28 October 2005 - Origin: Kern - Status: + What: Backup/restore for Zimbra and Zarafa - What: Currently, the autochanger script is locked based on - the autochanger. That is, if multiple drives are being - simultaneously used, the Storage daemon ensures that only - one drive at a time can access the mtx-changer script. - This change would base the locking on the control device, - rather than the autochanger. It would then permit two autochanger - definitions for the same autochanger, but with different - drives. Logically, the autochanger could then be "partitioned" - for different jobs, clients, or class of jobs, and if the locking - is based on the control device (e.g. /dev/sg0) the mtx-changer - script will be locked appropriately. - - Why: This will permit users to partition autochangers for specific - use. It would also permit implementation of multiple Media - Types with no changes to the Storage daemon. - -Item 12: Implement red/black binary tree routines. - Date: 28 October 2005 - Origin: Kern - Status: - What: Implement a red/black binary tree class. This could - then replace the current binary insert/search routines - used in the restore in memory tree. This could significantly - speed up the creation of the in memory restore tree. - Why: Performance enhancement. +Item 7: Include timestamp of job launch in "stat clients" output + Origin: Mark Bergman + Date: Tue Aug 22 17:13:39 EDT 2006 + Status: Done -Item 13: Improve Baculas tape and drive usage and cleaning management. - Date: 8 November 2005, November 11, 2005 - Origin: Adam Thornton , - Arno Lehmann - Status: + What: The "stat clients" command doesn't include any detail on when + the active backup jobs were launched. - What: Make Bacula manage tape life cycle information, tape reuse - times and drive cleaning cycles. - - Why: All three parts of this project are important when operating - backups. - We need to know which tapes need replacement, and we need to - make sure the drives are cleaned when necessary. While many - tape libraries and even autoloaders can handle all this - automatically, support by Bacula can be helpful for smaller - (older) libraries and single drives. Limiting the number of - times a tape is used might prevent tape errors when using - tapes until the drives can't read it any more. Also, checking - drive status during operation can prevent some failures (as I - [Arno] had to learn the hard way...) - - Notes: First, Bacula could (and even does, to some limited extent) - record tape and drive usage. For tapes, the number of mounts, - the amount of data, and the time the tape has actually been - running could be recorded. Data fields for Read and Write - time and Number of mounts already exist in the catalog (I'm - not sure if VolBytes is the sum of all bytes ever written to - that volume by Bacula). This information can be important - when determining which media to replace. The ability to mark - Volumes as "used up" after a given number of write cycles - should also be implemented so that a tape is never actually - worn out. For the tape drives known to Bacula, similar - information is interesting to determine the device status and - expected life time: Time it's been Reading and Writing, number - of tape Loads / Unloads / Errors. This information is not yet - recorded as far as I [Arno] know. A new volume status would - be necessary for the new state, like "Used up" or "Worn out". - Volumes with this state could be used for restores, but not - for writing. These volumes should be migrated first (assuming - migration is implemented) and, once they are no longer needed, - could be moved to a Trash pool. - - The next step would be to implement a drive cleaning setup. - Bacula already has knowledge about cleaning tapes. Once it - has some information about cleaning cycles (measured in drive - run time, number of tapes used, or calender days, for example) - it can automatically execute tape cleaning (with an - autochanger, obviously) or ask for operator assistance loading - a cleaning tape. - - The final step would be to implement TAPEALERT checks not only - when changing tapes and only sending the information to the - administrator, but rather checking after each tape error, - checking on a regular basis (for example after each tape - file), and also before unloading and after loading a new tape. - Then, depending on the drives TAPEALERT state and the known - drive cleaning state Bacula could automatically schedule later - cleaning, clean immediately, or inform the operator. - - Implementing this would perhaps require another catalog change - and perhaps major changes in SD code and the DIR-SD protocol, - so I'd only consider this worth implementing if it would - actually be used or even needed by many people. - - Implementation of these projects could happen in three distinct - sub-projects: Measuring Tape and Drive usage, retiring - volumes, and handling drive cleaning and TAPEALERTs. - - -Item 14: Merge multiple backups (Synthetic Backup or Consolidation). - Origin: Marc Cousin and Eric Bollengier - Date: 15 November 2005 - Status: Depends on first implementing project Item 1 (Migration). - - What: A merged backup is a backup made without connecting to the Client. - It would be a Merge of existing backups into a single backup. - In effect, it is like a restore but to the backup medium. - - For instance, say that last Sunday we made a full backup. Then - all week long, we created incremental backups, in order to do - them fast. Now comes Sunday again, and we need another full. - The merged backup makes it possible to do instead an incremental - backup (during the night for instance), and then create a merged - backup during the day, by using the full and incrementals from - the week. The merged backup will be exactly like a full made - Sunday night on the tape, but the production interruption on the - Client will be minimal, as the Client will only have to send - incrementals. - - In fact, if it's done correctly, you could merge all the - Incrementals into single Incremental, or all the Incrementals - and the last Differential into a new Differential, or the Full, - last differential and all the Incrementals into a new Full - backup. And there is no need to involve the Client. - - Why: The benefit is that : - - the Client just does an incremental ; - - the merged backup on tape is just as a single full backup, - and can be restored very fast. - - This is also a way of reducing the backup data since the old - data can then be pruned (or not) from the catalog, possibly - allowing older volumes to be recycled - -Item 15: Automatic disabling of devices - Date: 2005-11-11 - Origin: Peter Eriksson - Status: - - What: After a configurable amount of fatal errors with a tape drive - Bacula should automatically disable further use of a certain - tape drive. There should also be "disable"/"enable" commands in - the "bconsole" tool. + Why: Including the timestamp would make it much easier to decide whether + a job is running properly. - Why: On a multi-drive jukebox there is a possibility of tape drives - going bad during large backups (needing a cleaning tape run, - tapes getting stuck). It would be advantageous if Bacula would - automatically disable further use of a problematic tape drive - after a configurable amount of errors has occurred. + Notes: It may be helpful to have the output from "stat clients" formatted + more like that from "stat dir" (and other commands), in a column + format. The per-client information that's currently shown (level, + client name, JobId, Volume, pool, device, Files, etc.) is good, but + somewhat hard to parse (both programmatically and visually), + particularly when there are many active clients. - An example: I have a multi-drive jukebox (6 drives, 380+ slots) - where tapes occasionally get stuck inside the drive. Bacula will - notice that the "mtx-changer" command will fail and then fail - any backup jobs trying to use that drive. However, it will still - keep on trying to run new jobs using that drive and fail - - forever, and thus failing lots and lots of jobs... Since we have - many drives Bacula could have just automatically disabled - further use of that drive and used one of the other ones - instead. +Item 8: Include all conf files in specified directory +Date: 18 October 2008 +Origin: Database, Lda. Maputo, Mozambique +Contact:Cameron Smith / cameron.ord@database.co.mz +Status: New request -Item 16: Directive/mode to backup only file changes, not entire file - Date: 11 November 2005 - Origin: Joshua Kugler - Marek Bajon - Status: RFC +What: A directive something like "IncludeConf = /etc/bacula/subconfs" Every + time Bacula Director restarts or reloads, it will walk the given + directory (non-recursively) and include the contents of any files + therein, as though they were appended to bacula-dir.conf - What: Currently when a file changes, the entire file will be backed up in - the next incremental or full backup. To save space on the tapes - it would be nice to have a mode whereby only the changes to the - file would be backed up when it is changed. +Why: Permits simplified and safer configuration for larger installations with + many client PCs. Currently, through judicious use of JobDefs and + similar directives, it is possible to reduce the client-specific part of + a configuration to a minimum. The client-specific directives can be + prepared according to a standard template and dropped into a known + directory. However it is still necessary to add a line to the "master" + (bacula-dir.conf) referencing each new file. This exposes the master to + unnecessary risk of accidental mistakes and makes automation of adding + new client-confs, more difficult (it is easier to automate dropping a + file into a dir, than rewriting an existing file). Ken has previously + made a convincing argument for NOT including Bacula's core configuration + in an RDBMS, but I believe that the present request is a reasonable + extension to the current "flat-file-based" configuration philosophy. + +Notes: There is NO need for any special syntax to these files. They should + contain standard directives which are simply "inlined" to the parent + file as already happens when you explicitly reference an external file. - Why: This would save lots of space when backing up large files such as - logs, mbox files, Outlook PST files and the like. +Notes: (kes) this can already be done with scripting + From: John Jorgensen + The bacula-dir.conf at our site contains these lines: + + # + # Include subfiles associated with configuration of clients. + # They define the bulk of the Clients, Jobs, and FileSets. + # + @|"sh -c 'for f in /etc/bacula/clientdefs/*.conf ; do echo @${f} ; done'" + + and when we get a new client, we just put its configuration into + a new file called something like: + + /etc/bacula/clientdefs/clientname.conf - Notes: This would require the usage of disk-based volumes as comparing - files would not be feasible using a tape drive. -Item 17: Quick release of FD-SD connection after backup. - Origin: Frank Volf (frank at deze dot org) - Date: 17 November 2005 - Status: - What: In the Bacula implementation a backup is finished after all data - and attributes are successfully written to storage. When using a - tape backup it is very annoying that a backup can take a day, - simply because the current tape (or whatever) is full and the - administrator has not put a new one in. During that time the - system cannot be taken off-line, because there is still an open - session between the storage daemon and the file daemon on the - client. - - Although this is a very good strategy for making "safe backups" - This can be annoying for e.g. laptops, that must remain - connected until the backup is completed. - - Using a new feature called "migration" it will be possible to - spool first to harddisk (using a special 'spool' migration - scheme) and then migrate the backup to tape. - - There is still the problem of getting the attributes committed. - If it takes a very long time to do, with the current code, the - job has not terminated, and the File daemon is not freed up. The - Storage daemon should release the File daemon as soon as all the - file data and all the attributes have been sent to it (the SD). - Currently the SD waits until everything is on tape and all the - attributes are transmitted to the Director before signaling - completion to the FD. I don't think I would have any problem - changing this. The reason is that even if the FD reports back to - the Dir that all is OK, the job will not terminate until the SD - has done the same thing -- so in a way keeping the SD-FD link - open to the very end is not really very productive ... - - Why: Makes backup of laptops much easier. - - -Item 18: Add support for FileSets in user directories CACHEDIR.TAG - Origin: Norbert Kiesel - Date: 21 November 2005 + +Item 9: Reduction of communications bandwidth for a backup + Date: 14 October 2008 + Origin: Robin O'Leary (Equiinet) + Status: + + What: Using rdiff techniques, Bacula could significantly reduce + the network data transfer volume to do a backup. + + Why: Faster backup across the Internet + + Notes: This requires retaining certain data on the client during a Full + backup that will speed up subsequent backups. + + +Item 10: Concurrent spooling and despooling within a single job. +Date: 17 nov 2009 +Origin: Jesper Krogh +Status: NEW +What: When a job has spooling enabled and the spool area size is + less than the total volumes size the storage daemon will: + 1) Spool to spool-area + 2) Despool to tape + 3) Go to 1 if more data to be backed up. + + Typical disks will serve data with a speed of 100MB/s when + dealing with large files, network it typical capable of doing 115MB/s + (GbitE). Tape drives will despool with 50-90MB/s (LTO3) 70-120MB/s + (LTO4) depending on compression and data. + + As bacula currently works it'll hold back data from the client until + de-spooling is done, now matter if the spool area can handle another + block of data. Say given a FileSet of 4TB and a spool-area of 100GB and + a Maximum Job Spool Size set to 50GB then above sequence could be + changed to allow to spool to the other 50GB while despooling the first + 50GB and not holding back the client while doing it. As above numbers + show, depending on tape-drive and disk-arrays this potentially leads to + a cut of the backup-time of 50% for the individual jobs. + + Real-world example, backing up 112.6GB (large files) to LTO4 tapes + (despools with ~75MB/s, data is gzipped on the remote filesystem. + Maximum Job Spool Size = 8GB + + Current: + Size: 112.6GB + Elapsed time (total time): 46m 15s => 2775s + Despooling time: 25m 41s => 1541s (55%) + Spooling time: 20m 34s => 1234s (45%) + Reported speed: 40.58MB/s + Spooling speed: 112.6GB/1234s => 91.25MB/s + Despooling speed: 112.6GB/1541s => 73.07MB/s + + So disk + net can "keep up" with the LTO4 drive (in this test) + + Prosed change would effectively make the backup run in the "despooling + time" 1541s giving a reduction to 55% of the total run time. + + In the situation where the individual job cannot keep up with LTO-drive + spooling enables efficient multiplexing of multiple concurrent jobs onto + the same drive. + +Why: When dealing with larger volumes the general utillization of the + network/disk is important to maximize in order to be able to run a full + backup over a weekend. Current work-around is to split the FileSet in + smaller FileSet and Jobs but that leads to more configuration mangement + and is harder to review for completeness. Subsequently it makes restores + more complex. + + + +Item 11: Start spooling even when waiting on tape + Origin: Tobias Barth + Date: 25 April 2008 Status: - What: CACHDIR.TAG is a proposal for identifying directories which - should be ignored for archiving/backup. It works by ignoring - directory trees which have a file named CACHEDIR.TAG with a - specific content. See - http://www.brynosaurus.com/cachedir/spec.html - for details. - - From Peter Eriksson: - I suggest that if this is implemented (I've also asked for this - feature some year ago) that it is made compatible with Legato - Networkers ".nsr" files where you can specify a lot of options on - how to handle files/directories (including denying further - parsing of .nsr files lower down into the directory trees). A - PDF version of the .nsr man page can be viewed at: - - http://www.ifm.liu.se/~peter/nsr.pdf - - Why: It's a nice alternative to "exclude" patterns for directories - which don't have regular pathnames. Also, it allows users to - control backup for themselves. Implementation should be pretty - simple. GNU tar >= 1.14 or so supports it, too. - - Notes: I envision this as an optional feature to a fileset - specification. - -Item 19: Implement new {Client}Run{Before|After}Job feature. - Date: 26 September 2005 - Origin: Phil Stracchino - Status: + What: If a job can be spooled to disk before writing it to tape, it should + be spooled immediately. Currently, bacula waits until the correct + tape is inserted into the drive. + + Why: It could save hours. When bacula waits on the operator who must insert + the correct tape (e.g. a new tape or a tape from another media + pool), bacula could already prepare the spooled data in the spooling + directory and immediately start despooling when the tape was + inserted by the operator. + + 2nd step: Use 2 or more spooling directories. When one directory is + currently despooling, the next (on different disk drives) could + already be spooling the next data. + + Notes: I am using bacula 2.2.8, which has none of those features + implemented. + + +Item 12: Add ability to Verify any specified Job. +Date: 17 January 2008 +Origin: portrix.net Hamburg, Germany. +Contact: Christian Sabelmann +Status: 70% of the required Code is part of the Verify function since v. 2.x + + What: + The ability to tell Bacula which Job should verify instead of + automatically verify just the last one. + + Why: + It is sad that such a powerfull feature like Verify Jobs + (VolumeToCatalog) is restricted to be used only with the last backup Job + of a client. Actual users who have to do daily Backups are forced to + also do daily Verify Jobs in order to take advantage of this useful + feature. This Daily Verify after Backup conduct is not always desired + and Verify Jobs have to be sometimes scheduled. (Not necessarily + scheduled in Bacula). With this feature Admins can verify Jobs once a + Week or less per month, selecting the Jobs they want to verify. This + feature is also not to difficult to implement taking in account older bug + reports about this feature and the selection of the Job to be verified. + + Notes: For the verify Job, the user could select the Job to be verified + from a List of the latest Jobs of a client. It would also be possible to + verify a certain volume. All of these would naturaly apply only for + Jobs whose file information are still in the catalog. + + +Item 13: Data encryption on storage daemon + Origin: Tobias Barth + Date: 04 February 2009 + Status: new + + What: The storage demon should be able to do the data encryption that can + currently be done by the file daemon. + + Why: This would have 2 advantages: + 1) one could encrypt the data of unencrypted tapes by doing a + migration job + 2) the storage daemon would be the only machine that would have + to keep the encryption keys. + + Notes from Landon: + As an addendum to the feature request, here are some crypto + implementation details I wrote up regarding SD-encryption back in Jan + 2008: + http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg28860.html + + + +Item 14: Possibilty to schedule Jobs on last Friday of the month +Origin: Carsten Menke +Date: 02 March 2008 +Status: + + What: Currently if you want to run your monthly Backups on the last + Friday of each month this is only possible with workarounds (e.g + scripting) (As some months got 4 Fridays and some got 5 Fridays) + The same is true if you plan to run your yearly Backups on the + last Friday of the year. It would be nice to have the ability to + use the builtin scheduler for this. + + Why: In many companies the last working day of the week is Friday (or + Saturday), so to get the most data of the month onto the monthly + tape, the employees are advised to insert the tape for the + monthly backups on the last friday of the month. + + Notes: To give this a complete functionality it would be nice if the + "first" and "last" Keywords could be implemented in the + scheduler, so it is also possible to run monthy backups at the + first friday of the month and many things more. So if the syntax + would expand to this {first|last} {Month|Week|Day|Mo-Fri} of the + {Year|Month|Week} you would be able to run really flexible jobs. + + To got a certain Job run on the last Friday of the Month for example + one could then write + + Run = pool=Monthly last Fri of the Month at 23:50 + + ## Yearly Backup + + Run = pool=Yearly last Fri of the Year at 23:50 + + ## Certain Jobs the last Week of a Month + + Run = pool=LastWeek last Week of the Month at 23:50 + + ## Monthly Backup on the last day of the month + + Run = pool=Monthly last Day of the Month at 23:50 + +Item 15: Scheduling syntax that permits more flexibility and options + Date: 15 December 2006 + Origin: Gregory Brauer (greg at wildbrain dot com) and + Florian Schnabel + Status: - What: Some time ago, there was a discussion of RunAfterJob and - ClientRunAfterJob, and the fact that they do not run after failed - jobs. At the time, there was a suggestion to add a - RunAfterFailedJob directive (and, presumably, a matching - ClientRunAfterFailedJob directive), but to my knowledge these - were never implemented. - - An alternate way of approaching the problem has just occurred to - me. Suppose the RunBeforeJob and RunAfterJob directives were - expanded in a manner something like this example: - - RunBeforeJob { - Command = "/opt/bacula/etc/checkhost %c" - RunsOnClient = No - RunsAtJobLevels = All # All, Full, Diff, Inc - AbortJobOnError = Yes - } - RunBeforeJob { - Command = c:/bacula/systemstate.bat - RunsOnClient = yes - RunsAtJobLevels = All # All, Full, Diff, Inc - AbortJobOnError = No + What: Currently, Bacula only understands how to deal with weeks of the + month or weeks of the year in schedules. This makes it impossible + to do a true weekly rotation of tapes. There will always be a + discontinuity that will require disruptive manual intervention at + least monthly or yearly because week boundaries never align with + month or year boundaries. + + A solution would be to add a new syntax that defines (at least) + a start timestamp, and repetition period. + + An easy option to skip a certain job on a certain date. + + + Why: Rotated backups done at weekly intervals are useful, and Bacula + cannot currently do them without extensive hacking. + + You could then easily skip tape backups on holidays. Especially + if you got no autochanger and can only fit one backup on a tape + that would be really handy, other jobs could proceed normally + and you won't get errors that way. + + + Notes: Here is an example syntax showing a 3-week rotation where full + Backups would be performed every week on Saturday, and an + incremental would be performed every week on Tuesday. Each + set of tapes could be removed from the loader for the following + two cycles before coming back and being reused on the third + week. Since the execution times are determined by intervals + from a given point in time, there will never be any issues with + having to adjust to any sort of arbitrary time boundary. In + the example provided, I even define the starting schedule + as crossing both a year and a month boundary, but the run times + would be based on the "Repeat" value and would therefore happen + weekly as desired. + + + Schedule { + Name = "Week 1 Rotation" + #Saturday. Would run Dec 30, Jan 20, Feb 10, etc. + Run { + Options { + Type = Full + Start = 2006-12-30 01:00 + Repeat = 3w + } + } + #Tuesday. Would run Jan 2, Jan 23, Feb 13, etc. + Run { + Options { + Type = Incremental + Start = 2007-01-02 01:00 + Repeat = 3w + } + } } - RunAfterJob { - Command = c:/bacula/deletestatefile.bat - RunsOnClient = Yes - RunsAtJobLevels = All # All, Full, Diff, Inc - RunsOnSuccess = Yes - RunsOnFailure = Yes + Schedule { + Name = "Week 2 Rotation" + #Saturday. Would run Jan 6, Jan 27, Feb 17, etc. + Run { + Options { + Type = Full + Start = 2007-01-06 01:00 + Repeat = 3w + } + } + #Tuesday. Would run Jan 9, Jan 30, Feb 20, etc. + Run { + Options { + Type = Incremental + Start = 2007-01-09 01:00 + Repeat = 3w + } + } } - RunAfterJob { - Command = c:/bacula/somethingelse.bat - RunsOnClient = Yes - RunsAtJobLevels = All - RunsOnSuccess = No - RunsOnFailure = Yes - } - RunAfterJob { - Command = "/opt/bacula/etc/checkhost -v %c" - RunsOnClient = No - RunsAtJobLevels = All - RunsOnSuccess = No - RunsOnFailure = Yes + + Schedule { + Name = "Week 3 Rotation" + #Saturday. Would run Jan 13, Feb 3, Feb 24, etc. + Run { + Options { + Type = Full + Start = 2007-01-13 01:00 + Repeat = 3w + } + } + #Tuesday. Would run Jan 16, Feb 6, Feb 27, etc. + Run { + Options { + Type = Incremental + Start = 2007-01-16 01:00 + Repeat = 3w + } + } } + Notes: Kern: I have merged the previously separate project of skipping + jobs (via Schedule syntax) into this. + + +Item 16: Ability to defer Batch Insert to a later time + Date: 26 April 2009 + Origin: Eric + Status: + + What: Instead of doing a Job Batch Insert at the end of the Job + which might create resource contention with lots of Job, + defer the insert to a later time. + + Why: Permits to focus on getting the data on the Volume and + putting the metadata into the Catalog outside the backup + window. + + Notes: Will use the proposed Bacula ASCII database import/export + format (i.e. dependent on the import/export entities project). + - Why: It would be a significant change to the structure of the - directives, but allows for a lot more flexibility, including - RunAfter commands that will run regardless of whether the job - succeeds, or RunBefore tasks that still allow the job to run even - if that specific RunBefore fails. +Item 17: Add MaxVolumeSize/MaxVolumeBytes to Storage resource + Origin: Bastian Friedrich + Date: 2008-07-09 + Status: - - Notes: By Kern: I would prefer to have a single new Resource called - RunScript. More notes from Phil: + What: SD has a "Maximum Volume Size" statement, which is deprecated and + superseded by the Pool resource statement "Maximum Volume Bytes". + It would be good if either statement could be used in Storage + resources. - RunBeforeJob = yes|no - RunAfterJob = yes|no - RunsAtJobLevels = All|Full|Diff|Inc + Why: Pools do not have to be restricted to a single storage type/device; + thus, it may be impossible to define Maximum Volume Bytes in the + Pool resource. The old MaxVolSize statement is deprecated, as it + is SD side only. I am using the same pool for different devices. - The AbortJobOnError, RunsOnSuccess and RunsOnFailure directives - could be optional, and possibly RunsWhen as well. + Notes: State of idea currently unknown. Storage resources in the dir + config currently translate to very slim catalog entries; these + entries would require extensions to implement what is described + here. Quite possibly, numerous other statements that are currently + available in Pool resources could be used in Storage resources too + quite well. - AbortJobOnError would be ignored unless RunsWhen was set to Before - (or RunsBefore Job set to Yes), and would default to Yes if - omitted. If AbortJobOnError was set to No, failure of the script - would still generate a warning. - RunsOnSuccess would be ignored unless RunsWhen was set to After - (or RunsBeforeJob set to No), and default to Yes. +Item 18: Message mailing based on backup types + Origin: Evan Kaufman + Date: January 6, 2006 + Status: - RunsOnFailure would be ignored unless RunsWhen was set to After, - and default to No. + What: In the "Messages" resource definitions, allowing messages + to be mailed based on the type (backup, restore, etc.) and level + (full, differential, etc) of job that created the originating + message(s). - Allow having the before/after status on the script command - line so that the same script can be used both before/after. - David Boyes. + Why: It would, for example, allow someone's boss to be emailed + automatically only when a Full Backup job runs, so he can + retrieve the tapes for offsite storage, even if the IT dept. + doesn't (or can't) explicitly notify him. At the same time, his + mailbox wouldnt be filled by notifications of Verifies, Restores, + or Incremental/Differential Backups (which would likely be kept + onsite). -Item 20: Allow FD to initiate a backup - Origin: Frank Volf (frank at deze dot org) - Date: 17 November 2005 + Notes: One way this could be done is through additional message types, for + example: + + Messages { + # email the boss only on full system backups + Mail = boss@mycompany.com = full, !incremental, !differential, !restore, + !verify, !admin + # email us only when something breaks + MailOnError = itdept@mycompany.com = all + } + + Notes: Kern: This should be rather trivial to implement. + + +Item 19: Handle Windows Encrypted Files using Win raw encryption + Origin: Michael Mohr, SAG Mohr.External@infineon.com + Date: 22 February 2008 + Origin: Alex Ehrlich (Alex.Ehrlich-at-mail.ee) + Date: 05 August 2008 Status: - What: Provide some means, possibly by a restricted console that - allows a FD to initiate a backup, and that uses the connection - established by the FD to the Director for the backup so that - a Director that is firewalled can do the backup. + What: Make it possible to backup and restore Encypted Files from and to + Windows systems without the need to decrypt it by using the raw + encryption functions API (see: + http://msdn2.microsoft.com/en-us/library/aa363783.aspx) + that is provided for that reason by Microsoft. + If a file ist encrypted could be examined by evaluating the + FILE_ATTRIBUTE_ENCRYTED flag of the GetFileAttributes + function. + For each file backed up or restored by FD on Windows, check if + the file is encrypted; if so then use OpenEncryptedFileRaw, + ReadEncryptedFileRaw, WriteEncryptedFileRaw, + CloseEncryptedFileRaw instead of BackupRead and BackupWrite + API calls. + + Why: Without the usage of this interface the fd-daemon running + under the system account can't read encypted Files because + the key needed for the decrytion is missed by them. As a result + actually encrypted files are not backed up + by bacula and also no error is shown while missing these files. + + Notes: Using xxxEncryptedFileRaw API would allow to backup and + restore EFS-encrypted files without decrypting their data. + Note that such files cannot be restored "portably" (at least, + easily) but they would be restoreable to a different (or + reinstalled) Win32 machine; the restore would require setup + of a EFS recovery agent in advance, of course, and this shall + be clearly reflected in the documentation, but this is the + normal Windows SysAdmin's business. + When "portable" backup is requested the EFS-encrypted files + shall be clearly reported as errors. + See MSDN on the "Backup and Restore of Encrypted Files" topic: + http://msdn.microsoft.com/en-us/library/aa363783.aspx + Maybe the EFS support requires a new flag in the database for + each file, too? + Unfortunately, the implementation is not as straightforward as + 1-to-1 replacement of BackupRead with ReadEncryptedFileRaw, + requiring some FD code rewrite to work with + encrypted-file-related callback functions. + +Item 20: Job migration between different SDs +Origin: Mariusz Czulada +Date: 07 May 2007 +Status: NEW + +What: Allow to specify in migration job devices on Storage Daemon other then + the one used for migrated jobs (possibly on different/distant host) + +Why: Sometimes we have more then one system which requires backup + implementation. Often, these systems are functionally unrelated and + placed in different locations. Having a big backup device (a tape + library) in each location is not cost-effective. It would be much + better to have one powerful enough tape library which could handle + backups from all systems, assuming relatively fast and reliable WAN + connections. In such architecture backups are done in service windows + on local bacula servers, then migrated to central storage off the peak + hours. + +Notes: If migration to different SD is working, migration to the same SD, as + now, could be done the same way (i mean 'localhost') to unify the + whole process + +Item 19. Allow FD to initiate a backup +Origin: Frank Volf (frank at deze dot org) +Date: 17 November 2005 +Status: + +What: Provide some means, possibly by a restricted console that + allows a FD to initiate a backup, and that uses the connection + established by the FD to the Director for the backup so that + a Director that is firewalled can do the backup. +Why: Makes backup of laptops much easier. +Notes: - The FD already has code for the monitor interface + - It could be nice to have a .job command that lists authorized + jobs. + - Commands need to be restricted on the Director side + (for example by re-using the runscript flag) + - The Client resource can be used to authorize the connection + - In a first time, the client can't modify job parameters + - We need a way to run a status command to follow job progression + + This project consists of the following points + 1. Modify the FD to have a "mini-console" interface that + permits it to connect to the Director and start a + backup job of itself. + 2. The list of jobs that can be started by the FD are + defined in the Director (possibly via a restricted + console). + 3. Modify the existing tray monitor code in the Win32 FD + so that it is a separate program from the FD. + 4. The tray monitor program should be extended to permit + initiating a backup. + 5. No new Director directives should be added without + prior consultation with the Bacula developers. + 6. The comm line used by the FD to connect to the Director + should be re-used by the Director to do the backup. + This feature is partially implemented in the Director. + 7. The FD may have a new directive that allows it to start + a backup when the FD starts. + 8. The console interface to the FD should be extended to + permit a properly authorized console to initiate a + backup via the FD. + + +Item 21: Implement Storage daemon compression + Date: 18 December 2006 + Origin: Vadim A. Umanski , e-mail umanski@ext.ru + Status: + What: The ability to compress backup data on the SD receiving data + instead of doing that on client sending data. + Why: The need is practical. I've got some machines that can send + data to the network 4 or 5 times faster than compressing + them (I've measured that). They're using fast enough SCSI/FC + disk subsystems but rather slow CPUs (ex. UltraSPARC II). + And the backup server has got a quite fast CPUs (ex. Dual P4 + Xeons) and quite a low load. When you have 20, 50 or 100 GB + of raw data - running a job 4 to 5 times faster - that + really matters. On the other hand, the data can be + compressed 50% or better - so losing twice more space for + disk backup is not good at all. And the network is all mine + (I have a dedicated management/provisioning network) and I + can get as high bandwidth as I need - 100Mbps, 1000Mbps... + That's why the server-side compression feature is needed! + Notes: + +Item 22: Ability to import/export Bacula database entities + Date: 26 April 2009 + Origin: Eric + Status: + + What: Create a Bacula ASCII SQL database independent format that permits + importing and exporting database catalog Job entities. + + Why: For achival, database clustering, tranfer to other databases + of any SQL engine. + + Notes: Job selection should be by Job, time, Volume, Client, Pool and possibly + other criteria. + + +Item 23: Implementation of running Job speed limit. +Origin: Alex F, alexxzell at yahoo dot com +Date: 29 January 2009 + +What: I noticed the need for an integrated bandwidth limiter for + running jobs. It would be very useful just to specify another + field in bacula-dir.conf, like speed = how much speed you wish + for that specific job to run at + +Why: Because of a couple of reasons. First, it's very hard to implement a + traffic shaping utility and also make it reliable. Second, it is very + uncomfortable to have to implement these apps to, let's say 50 clients + (including desktops, servers). This would also be unreliable because you + have to make sure that the apps are properly working when needed; users + could also disable them (accidentally or not). It would be very useful + to provide Bacula this ability. All information would be centralized, + you would not have to go to 50 different clients in 10 different + locations for configuration; eliminating 3rd party additions help in + establishing efficiency. Would also avoid bandwidth congestion, + especially where there is little available. + + +Item 24: Add an override in Schedule for Pools based on backup types +Date: 19 Jan 2005 +Origin: Chad Slater +Status: + + What: Adding a FullStorage=BigTapeLibrary in the Schedule resource + would help those of us who use different storage devices for different + backup levels cope with the "auto-upgrade" of a backup. + + Why: Assume I add several new devices to be backed up, i.e. several + hosts with 1TB RAID. To avoid tape switching hassles, incrementals are + stored in a disk set on a 2TB RAID. If you add these devices in the + middle of the month, the incrementals are upgraded to "full" backups, + but they try to use the same storage device as requested in the + incremental job, filling up the RAID holding the differentials. If we + could override the Storage parameter for full and/or differential + backups, then the Full job would use the proper Storage device, which + has more capacity (i.e. a 8TB tape library. + + +Item 25: Automatic promotion of backup levels based on backup size + Date: 19 January 2006 + Origin: Adam Thornton + Status: - Why: Makes backup of laptops much easier. + What: Other backup programs have a feature whereby it estimates the space + that a differential, incremental, and full backup would take. If + the difference in space required between the scheduled level and the + next level up is beneath some user-defined critical threshold, the + backup level is bumped to the next type. Doing this minimizes the + number of volumes necessary during a restore, with a fairly minimal + cost in backup media space. -Item 21: Multiple threads in file daemon for the same job - Date: 27 November 2005 - Origin: Ove Risberg (Ove.Risberg at octocode dot com) + Why: I know at least one (quite sophisticated and smart) user for whom the + absence of this feature is a deal-breaker in terms of using Bacula; + if we had it it would eliminate the one cool thing other backup + programs can do and we can't (at least, the one cool thing I know + of). + + +Item 26: Allow FileSet inclusion/exclusion by creation/mod times + Origin: Evan Kaufman + Date: January 11, 2006 Status: - What: I want the file daemon to start multiple threads for a backup - job so the fastest possible backup can be made. + What: In the vein of the Wild and Regex directives in a Fileset's + Options, it would be helpful to allow a user to include or exclude + files and directories by creation or modification times. - The file daemon could parse the FileSet information and start - one thread for each File entry located on a separate - filesystem. + You could factor the Exclude=yes|no option in much the same way it + affects the Wild and Regex directives. For example, you could exclude + all files modified before a certain date: - A configuration option in the job section should be used to - enable or disable this feature. The configuration option could - specify the maximum number of threads in the file daemon. + Options { + Exclude = yes + Modified Before = #### + } - If the theads could spool the data to separate spool files - the restore process will not be much slower. + Or you could exclude all files created/modified since a certain date: - Why: Multiple concurrent backups of a large fileserver with many - disks and controllers will be much faster. + Options { + Exclude = yes + Created Modified Since = #### + } + + The format of the time/date could be done several ways, say the number + of seconds since the epoch: + 1137008553 = Jan 11 2006, 1:42:33PM # result of `date +%s` + + Or a human readable date in a cryptic form: + 20060111134233 = Jan 11 2006, 1:42:33PM # YYYYMMDDhhmmss - Notes: I am willing to try to implement this but I will probably - need some help and advice. (No problem -- Kern) + Why: I imagine a feature like this could have many uses. It would + allow a user to do a full backup while excluding the base operating + system files, so if I installed a Linux snapshot from a CD yesterday, + I'll *exclude* all files modified *before* today. If I need to + recover the system, I use the CD I already have, plus the tape backup. + Or if, say, a Windows client is hit by a particularly corrosive + virus, and I need to *exclude* any files created/modified *since* the + time of infection. -Item 22: Archival (removal) of User Files to Tape + Notes: Of course, this feature would work in concert with other + in/exclude rules, and wouldnt override them (or each other). - Date: Nov. 24/2005 + Notes: The directives I'd imagine would be along the lines of + "[Created] [Modified] [Before|Since] = ". + So one could compare against 'ctime' and/or 'mtime', but ONLY 'before' + or 'since'. + +Item 27: Archival (removal) of User Files to Tape + Date: Nov. 24/2005 Origin: Ray Pengelly [ray at biomed dot queensu dot ca Status: - What: The ability to archive data to storage based on certain parameters + What: The ability to archive data to storage based on certain parameters such as age, size, or location. Once the data has been written to storage and logged it is then pruned from the originating filesystem. Note! We are talking about user's files and not Bacula Volumes. - Why: This would allow fully automatic storage management which becomes + Why: This would allow fully automatic storage management which becomes useful for large datastores. It would also allow for auto-staging from one media type to another. @@ -627,108 +837,515 @@ Item 22: Archival (removal) of User Files to Tape access time. Then after another 6 months (or possibly as one storage pool gets full) data is migrated to Tape. +Item 28: Ability to reconnect a disconnected comm line + Date: 26 April 2009 + Origin: Kern/Eric + Status: + + What: Often jobs fail because of a communications line drop. In that + case, Bacula should be able to reconnect to the other daemon and + resume the job. -Item 23: Deletion of Disk-Based Bacula Volumes - Date: Nov 25, 2005 - Origin: Ross Boylan (edited - by Kern) - Status: + Why: Avoids backuping data already saved. - What: Provide a way for Bacula to automatically remove Volumes - from the filesystem, or optionally to truncate them. - Obviously, the Volume must be pruned prior removal. + Notes: *Very* complicated from a design point of view because of authenication. - Why: This would allow users more control over their Volumes and - prevent disk based volumes from consuming too much space. +Item 29: Multiple threads in file daemon for the same job + Date: 27 November 2005 + Origin: Ove Risberg (Ove.Risberg at octocode dot com) + Status: - Notes: The following two directives might do the trick: + What: I want the file daemon to start multiple threads for a backup + job so the fastest possible backup can be made. - Volume Data Retention =