X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fkernstodo;h=9d6fd077930f843728092511a20bc4a48167a449;hb=d0b240f353d003f66bd38c7a2e829f72ba64c590;hp=9fd9f426914bd787558e06e66184b28490dbbe5e;hpb=98ca3b3b82d37ff8d48ab92c8210324ae0397f47;p=bacula%2Fbacula diff --git a/bacula/kernstodo b/bacula/kernstodo index 9fd9f42691..9d6fd07793 100644 --- a/bacula/kernstodo +++ b/bacula/kernstodo @@ -1,5 +1,5 @@ Kern's ToDo List - 16 July 2007 + 02 January 2008 Document: @@ -71,9 +71,94 @@ Professional Needs: and http://www.openeyet.nl/scc/ for managing customer changes Priority: +=== Duplicate jobs === + hese apply only to backup jobs. + + 1. Allow Duplicate Jobs = Yes | No | Higher (Yes) + + 2. Duplicate Job Interval = (0) + + The defaults are in parenthesis and would produce the same behavior as today. + + If Allow Duplicate Jobs is set to No, then any job starting while a job of the + same name is running will be canceled. + + If Allow Duplicate Jobs is set to Higher, then any job starting with the same + or lower level will be canceled, but any job with a Higher level will start. + The Levels are from High to Low: Full, Differential, Incremental + + Finally, if you have Duplicate Job Interval set to a non-zero value, any job + of the same name which starts after a previous job of the + same name would run, any one that starts within would be + subject to the above rules. Another way of looking at it is that the Allow + Duplicate Jobs directive will only apply after of when the + previous job finished (i.e. it is the minimum interval between jobs). + + So in summary: + + Allow Duplicate Jobs = Yes | No | HigherLevel | CancelLowerLevel (Yes) + + Where HigherLevel cancels any waiting job but not any running job. + Where CancelLowerLevel is same as HigherLevel but cancels any running job or + waiting job. + + Duplicate Job Proximity = (0) + + Skip = Do not allow two or more jobs with the same name to run + simultaneously within the proximity interval. The second and subsequent + jobs are skipped without further processing (other than to note the job + and exit immediately), and are not considered errors. + + Fail = The second and subsequent jobs that attempt to run during the + proximity interval are cancelled and treated as error-terminated jobs. + + Promote = If a job is running, and a second/subsequent job of higher + level attempts to start, the running job is promoted to the higher level + of processing using the resources already allocated, and the subsequent + job is treated as in Skip above. +=== +- the cd-command should allow complete paths + i.e. cd /foo/bar/foo/bar + -> if a customer mails me the path to a certain file, + its faster to enter the specified directory +- Fix bpipe.c so that it does not modify results pointer. + ***FIXME*** calling sequence should be changed. +- Make tree walk routines like cd, ls, ... more user friendly + by handling spaces better. +=== rate design + jcr->last_rate + jcr->last_runtime + MA = (last_MA * 3 + rate) / 4 + rate = (bytes - last_bytes) / (runtime - last_runtime) +- Add a recursive mark command (rmark) to restore. +- "Minimum Job Interval = nnn" sets minimum interval between Jobs + of the same level and does not permit multiple simultaneous + running of that Job (i.e. lets any previous invocation finish + before doing Interval testing). +- Look at simplifying File exclusions. +- New directive "Delete purged Volumes" +- It appears to me that you have run into some sort of race + condition where two threads want to use the same Volume and they + were both given access. Normally that is no problem. However, + one thread wanted the particular Volume in drive 0, but it was + loaded into drive 1 so it decided to unload it from drive 1 and + then loaded it into drive 0, while the second thread went on + thinking that the Volume could be used in drive 1 not realizing + that in between time, it was loaded in drive 0. + I'll look at the code to see if there is some way we can avoid + this kind of problem. Probably the best solution is to make the + first thread simply start using the Volume in drive 1 rather than + transferring it to drive 0. +- Complete Catalog in Pool +- Implement Bacula plugins -- design API +- Scripts +- Prune by Job +- Prune by Job Level +- True automatic pruning - Duplicate Jobs Run, Fail, Skip, Higher, Promote, CancelLowerLevel Proximity + New directive. - Auto update of slot: rufus-dir: ua_run.c:456-10 JobId=10 NewJobId=10 using pool Full priority=10 02-Nov 12:58 rufus-dir JobId 10: Start Backup JobId 10, Job=kernsave.2007-11-02_12.58.03 @@ -84,14 +169,6 @@ Priority: 02-Nov 12:58 rufus-sd JobId 10: Wrote label to prelabeled Volume "Vol001" on device "DDS-4" (/dev/nst0) 02-Nov 12:58 rufus-sd JobId 10: Alert: TapeAlert[7]: Media Life: The tape has reached the end of its useful life. 02-Nov 12:58 rufus-dir JobId 10: Bacula rufus-dir 2.3.6 (26Oct07): 02-Nov-2007 12:58:51 -- Encrypt sd_auth_key = s with director's key = d - k[i] = s[i] + (d[i] & 0xF)) & 0xFF + 'A' skip - - Decrypt key = k with director's key - x = k[i] - (d[i] & 0xF)) - if (x < 0) { - x = k[i] - (d[i] & 0xF) + 16 - } - s[i] = x + 'A'; - Eliminate: /var is a different filesystem. Will not descend from / into /var - Separate Files and Directories in catalog - Create FileVersions table @@ -104,12 +181,7 @@ Priority: generation of the error message doesn't differentiate result==NULL and a bad status from that result. Not only that, the result is cleared on a bail_out without having generated the error message. -- Erabt if min_block_size > max_block_size - KIWI -- Implement wait on multiple objects - - Multiple max times - - pthread signal - - socket input ready - Implement SDErrors (must return from SD) - Implement USB keyboard support in rescue CD. - Implement continue spooling while despooling. @@ -153,7 +225,6 @@ Priority: > configuration string value to a CRYPTO_CIPHER_* value, if anyone is > interested in implementing this functionality. -- Why doesn't @"xxx abc" work in a conf file? - Figure out some way to "automatically" backup conf changes. - Add the OS version back to the Win32 client info. - Restarted jobs have a NULL in the from field. @@ -225,9 +296,6 @@ Projects: For next release: - Try to fix bscan not working with multiple DVD volumes bug #912. - Look at mondo/mindi -- Don't restore Solaris Door files: - #define S_IFDOOR in st_mode. - see: http://docs.sun.com/app/docs/doc/816-5173/6mbb8ae23?a=view#indexterm-360 - Make Bacula by default not backup tmpfs, procfs, sysfs, ... - Fix hardlinked immutable files when linking a second file, the immutable flag must be removed prior to trying to link it. @@ -284,18 +352,6 @@ Low priority: http://linuxwiki.de/Bacula (in German) - Possibly allow SD to spool even if a tape is not mounted. -- It appears to me that you have run into some sort of race - condition where two threads want to use the same Volume and they - were both given access. Normally that is no problem. However, - one thread wanted the particular Volume in drive 0, but it was - loaded into drive 1 so it decided to unload it from drive 1 and - then loaded it into drive 0, while the second thread went on - thinking that the Volume could be used in drive 1 not realizing - that in between time, it was loaded in drive 0. - I'll look at the code to see if there is some way we can avoid - this kind of problem. Probably the best solution is to make the - first thread simply start using the Volume in drive 1 rather than - transferring it to drive 0. - Fix re-read of last block to check if job has actually written a block, and check if block was written by a different job (i.e. multiple simultaneous jobs writing). @@ -366,70 +422,6 @@ select Path.Path from Path,File where File.JobId=nnn and - Look into replacing autotools with cmake http://www.cmake.org/HTML/Index.html -=== Migration from David === -What I'd like to see: - -Job { - Name = "-migrate" - Type = Migrate - Messages = Standard - Pool = Default - Migration Selection Type = LowestUtil | OldestVol | PoolOccupancy | -Client | PoolResidence | Volume | JobName | SQLquery - Migration Selection Pattern = "regexp" - Next Pool = -} - -There should be no need for a Level (migration is always Full, since you -don't calculate differential/incremental differences for migration), -Storage should be determined by the volume types in the pool, and Client -is really a selection issue. Migration should always occur to the -NextPool defined in the pool definition. If no nextpool is defined, the -job should end with a reason of "no place to go". If Next Pool statement -is present, we override the check in the pool definition and use the -pool specified. - -Here's how I'd define Migration Selection Types: - -With Regexes: -Client -- Migrate data from selected client only. Migration Selection -Pattern regexp provides pattern to select client names, eg ^FS00* makes -all client names starting with FS00 eligible for migration. - -Jobname -- Migration all jobs matching name. Migration Selection Pattern -regexp provides pattern to select jobnames existing in pool. - -Volume -- Migrate all data on specified volumes. Migration Selection -Pattern regexp provides selection criteria for volumes to be migrated. -Volumes must exist in pool to be eligible for migration. - - -With Regex optional: -LowestUtil -- Identify the volume in the pool with the least data on it -and empty it. No Migration Selection Pattern required. - -OldestVol -- Identify the LRU volume with data written, and empty it. No -Migration Selection Pattern required. - -PoolOccupancy -- if pool occupancy exceeds , migrate volumes -(starting with most full volumes) until pool occupancy drops below -. Pool highmig and lowmig values are in pool definition, no -Migration Selection Pattern required. - - -No regex: -SQLQuery -- Migrate all jobuids returned by the supplied SQL query. -Migration Selection Pattern contains SQL query to execute; should return -a list of 1 or more jobuids to migrate. - -PoolResidence -- Migrate data sitting in pool for longer than -PoolResidence value in pool definition. Migration Selection Pattern -optional; if specified, override value in pool definition (value in -minutes). - - -[ possibly a Python event -- kes ] -=== - Mount on an Autochanger with no tape in the drive causes: Automatically selected Storage: LTO-changer Enter autochanger drive[0]: 0 @@ -578,20 +570,12 @@ minutes). ("D","Diff"), ("I","Inc"); - Show files/second in client status output. -- Add a recursive mark command (rmark) to restore. -- "Minimum Job Interval = nnn" sets minimum interval between Jobs - of the same level and does not permit multiple simultaneous - running of that Job (i.e. lets any previous invocation finish - before doing Interval testing). -- Look at simplifying File exclusions. -- New directive "Delete purged Volumes" - new pool XXX with ScratchPoolId = MyScratchPool's PoolId and let it fill itself, and RecyclePoolId = XXX's PoolId so I can see if it become stable and I just have to supervise MyScratchPool - If I want to remove this pool, I set RecyclePoolId = MyScratchPool's PoolId, and when it is empty remove it. -- Figure out how to recycle Scratch volumes back to the Scratch Pool. - Add Volume=SCRTCH - Allow Check Labels to be used with Bacula labels. - "Resuming" a failed backup (lost line for example) by using the @@ -623,8 +607,6 @@ minutes). backups of the same client and if we again try to start a full backup of client backup abc bacula won't complain. That should be fixed. -- Fix bpipe.c so that it does not modify results pointer. - ***FIXME*** calling sequence should be changed. - For Windows disaster recovery see http://unattended.sf.net/ - regardless of the retention period, Bacula will not prune the last Full, Diff, or Inc File data until a month after the @@ -654,11 +636,8 @@ minutes). - In restore don't compare byte count on a raw device -- directory entry does not contain bytes. -=== rate design - jcr->last_rate - jcr->last_runtime - MA = (last_MA * 3 + rate) / 4 - rate = (bytes - last_bytes) / (runtime - last_runtime) + + - Max Vols limit in Pool off by one? - Implement Files/Bytes,... stats for restore job. - Implement Total Bytes Written, ... for restore job. @@ -773,6 +752,77 @@ minutes). - It remains to be seen how the backup performance of the DIR's will be affected when comparing the catalog for a large filesystem. + 1. Use the current Director in-memory tree code (very fast), but currently in + memory. It probably could be paged. + + 2. Use some DB such as Berkeley DB or SQLite. SQLite is already compiled and + built for Win32, and it is something we could compile into the program. + + 3. Implement our own custom DB code. + + Note, by appropriate use of Directives in the Director, we can dynamically + decide if the work is done in the Director or in the FD, and we can even + allow the user to choose. + +=== most recent accurate file backup/restore === + Here is a sketch (i.e. more details must be filled in later) that I recently + made of an algorithm for doing Accurate Backup. + + 1. Dir informs FD that it is doing an Accurate backup and lookup done by + Director. + + 2. FD passes through the file system doing a normal backup based on normal + conditions, recording the names of all files and their attributes, and + indicating which files were backed up. This is very similar to what Verify + does. + + 3. The Director receives the two lists of files at the end of the FD backup. + One, files backed up, and one files not backed up. It then looks up all the + files not backed up (using Verify style code). + + 4. The Dir sends the FD a list of: + a. Additional files to backup (based on user specified criteria, name, size + inode date, hash, ...). + b. Files to delete. + + 5. Dir deletes list of file not backed up. + + 6. FD backs up additional files generates a list of those backed up and sends + it to the Director, which adds it to the list of files backed up. The list + is now complete and current. + + 7. The FD generates delete records for all the files that were deleted and + sends to the SD. + + 8. The Dir deletes the previous CurrentBackup list, and then does a + transaction insert of the new list that it has. + + 9. The rest works as before ... + + That is it. + + Two new tables needed. + 1. CurrentBackupId table that contains Client, JobName, FileSet, and a unique + BackupId. This is created during a Full save, and the BackupId can be set to + the JobId of the Full save. It will remain the same until another Full + backup is done. That is when new records are added during a Differential or + Incremental, they must use the same BackupId. + + 2. CurrentBackup table that contains essentially a File record (less a number + of fields, but with a few extra fields) -- e.g. a flag that the File was + backed up by a Full save (this permits doing a Differential). The unique + BackupId allows us to look up the CurrentBackup for a particular Client, + Jobname, FileSet using that unique BackupId as the key, so this table must be + indexed by the BackupId. + + Note any time a file is saved by the FD other than during a Full save, the + Full save flag is cleared. When doing a Differential backup, if a file has + the Full save flag set, it is skipped, otherwise it is backed up. For an + Incremental backup, we check to see if the file has changed since the last + time we backed it up. + + Deleted files should have FileIndex == 0 + ==== From David: How about introducing a Type = MgmtPolicy job type? That job type would @@ -931,18 +981,6 @@ Why: format string. Then I have the tape labeled automatically with weekday name in the correct language. ========== -- Yes, that is surely the case. I probably should turn those into Warning - errors. In addition, you just made me think that it might not be bad to - add an option to check the file size after backing up the file and - report if it changes. This would be done as an option because it would - add extra overhead. - - Kern, good idea. If you do do that, mention in the output: file - shrunk, or file expanded, just to make it obvious to the user - (without having to the refer to file size), just how the file size - changed. - - Would this option be for all file, or just one file? Or a fileset? - Make output from status use html table tags for nicely presenting in a browser. - Can one write tapes faster with 8192 byte block sizes? @@ -1062,10 +1100,6 @@ Documentation to do: (any release a little bit at a time) -> maybe its more easy to maintain this, if the descriptions of that commands are outsourced to a ceratin-file -- the cd-command should allow complete paths - i.e. cd /foo/bar/foo/bar - -> if a customer mails me the path to a certain file, - its faster to enter the specified directory - if the password is not configured in bconsole.conf you should be asked for it. -> sometimes you like to do restore on a customer-machine @@ -1268,8 +1302,6 @@ Documentation to do: (any release a little bit at a time) to start a job or pass its DHCP obtained IP number. - Implement a query tape prompt/replace feature for a console - Copy console @ code to gnome2-console -- Make tree walk routines like cd, ls, ... more user friendly - by handling spaces better. - Make sure that Bacula rechecks the tape after the 20 min wait. - Set IO_NOWAIT on Bacula TCP/IP packets. - Try doing a raw partition backup and restore by mounting a @@ -1348,7 +1380,6 @@ Documentation to do: (any release a little bit at a time) - Have SD compute MD5 or SHA1 and compare to what FD computes. - Make VolumeToCatalog calculate an MD5 or SHA1 from the actual data on the Volume and compare it. -- Implement Bacula plugins -- design API - Make bcopy read through bad tape records. - Program files (i.e. execute a program to read/write files). Pass read date of last backup, size of file last time. @@ -1574,6 +1605,10 @@ Jerry Schieffer ============================== Longer term to do: +- Implement wait on multiple objects + - Multiple max times + - pthread signal + - socket input ready - Design at hierarchial storage for Bacula. Migration and Clone. - Implement FSM (File System Modules). - Audit M_ error codes to ensure they are correct and consistent. @@ -1588,9 +1623,6 @@ Longer term to do: - Enhance time/duration input to allow multiple qualifiers e.g. 3d2h - Add ability to backup to two Storage devices (two SD sessions) at the same time -- e.g. onsite, offsite. -- Add the ability to consolidate old backup sets (basically do a restore - to tape and appropriately update the catalog). Compress Volume sets. - Might need to spool via file is only one drive is available. - Compress or consolidate Volumes of old possibly deleted files. Perhaps someway to do so with every volume that has less than x% valid files. @@ -1830,3 +1862,12 @@ Block Position: 0 does the right thing. - FD-SD quick disconnect - Building the in memory restore tree is slow. +- Erabt if min_block_size > max_block_size +- Add the ability to consolidate old backup sets (basically do a restore + to tape and appropriately update the catalog). Compress Volume sets. + Might need to spool via file is only one drive is available. +- Why doesn't @"xxx abc" work in a conf file? +- Don't restore Solaris Door files: + #define S_IFDOOR in st_mode. + see: http://docs.sun.com/app/docs/doc/816-5173/6mbb8ae23?a=view#indexterm-360 +- Figure out how to recycle Scratch volumes back to the Scratch Pool.