X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fkernstodo;h=530d143294a850ace68f623eaf0744fd3e235fbd;hb=746f5583428e4d2a572cef32c02e259b3e5eb286;hp=db0218fc739527ab5fa5e7f0afa38251b5974ba9;hpb=5ec2ac1ba02d3e6822487ad2248d8be07dac26ae;p=bacula%2Fbacula diff --git a/bacula/kernstodo b/bacula/kernstodo index db0218fc73..530d143294 100644 --- a/bacula/kernstodo +++ b/bacula/kernstodo @@ -1,8 +1,51 @@ Kern's ToDo List - 16 July 2007 + 02 May 2008 Document: +- This patch will give Bacula the option to specify files in + FileSets which can be dropped in directories which are Included + which will cause that directory not the be backed up. + + For example, my FileSet contains: + # List of files to be backed up + FileSet { + Name = "Remote Specified1" + Include { + Options { + signature = MD5 + } + File = "\\reserved_volume +- Softlinks that point to non-existent file are not restored in restore all, + but are restored if the file is individually selected. BUG! +- Doc Duplicate Jobs. +- New directive "Delete purged Volumes" +- Prune by Job +- Prune by Job Level (Full, Differential, Incremental) +- Strict automatic pruning +- Implement unmount of USB volumes. +- Use "./config no-idea no-mdc2 no-rc5" on building OpenSSL for + Win32 to avoid patent problems. +- Implement multiple jobid specification for the cancel command, + similar to what is permitted on the update slots command. +- Implement Bacula plugins -- design API +- modify pruning to keep a fixed number of versions of a file, + if requested. +- the cd-command should allow complete paths + i.e. cd /foo/bar/foo/bar + -> if a customer mails me the path to a certain file, + its faster to enter the specified directory +- Make tree walk routines like cd, ls, ... more user friendly + by handling spaces better. +=== rate design + jcr->last_rate + jcr->last_runtime + MA = (last_MA * 3 + rate) / 4 + rate = (bytes - last_bytes) / (runtime - last_runtime) +- Add a recursive mark command (rmark) to restore. +- "Minimum Job Interval = nnn" sets minimum interval between Jobs + of the same level and does not permit multiple simultaneous + running of that Job (i.e. lets any previous invocation finish + before doing Interval testing). +- Look at simplifying File exclusions. +- Scripts +- Auto update of slot: + rufus-dir: ua_run.c:456-10 JobId=10 NewJobId=10 using pool Full priority=10 + 02-Nov 12:58 rufus-dir JobId 10: Start Backup JobId 10, Job=kernsave.2007-11-02_12.58.03 + 02-Nov 12:58 rufus-dir JobId 10: Using Device "DDS-4" + 02-Nov 12:58 rufus-sd JobId 10: Invalid slot=0 defined in catalog for Volume "Vol001" on "DDS-4" (/dev/nst0). Manual load my be required. + 02-Nov 12:58 rufus-sd JobId 10: 3301 Issuing autochanger "loaded? drive 0" command. + 02-Nov 12:58 rufus-sd JobId 10: 3302 Autochanger "loaded? drive 0", result is Slot 2. + 02-Nov 12:58 rufus-sd JobId 10: Wrote label to prelabeled Volume "Vol001" on device "DDS-4" (/dev/nst0) + 02-Nov 12:58 rufus-sd JobId 10: Alert: TapeAlert[7]: Media Life: The tape has reached the end of its useful life. + 02-Nov 12:58 rufus-dir JobId 10: Bacula rufus-dir 2.3.6 (26Oct07): 02-Nov-2007 12:58:51 +- Eliminate: /var is a different filesystem. Will not descend from / into /var +- Separate Files and Directories in catalog +- Create FileVersions table +- Look at rsysnc for incremental updates and dedupping +- Add MD5 or SHA1 check in SD for data validation - finish implementation of fdcalled -- see ua_run.c:105 - Fix problem in postgresql.c in my_postgresql_query, where the generation of the error message doesn't differentiate result==NULL and a bad status from that result. Not only that, the result is cleared on a bail_out without having generated the error message. -- Erabt if min_block_size > max_block_size - KIWI -- Implement wait on multiple objects - - Multiple max times - - pthread signal - - socket input ready - Implement SDErrors (must return from SD) - Implement USB keyboard support in rescue CD. - Implement continue spooling while despooling. - Remove all install temp files in Win32 PLUGINSDIR. - Audit retention periods to make sure everything is 64 bit. -- Use E'xxx' to escape PostgreSQL strings. - No where in restore causes kaboom. - Performance: multiple spool files for a single job. - Performance: despool attributes when despooling data (problem multiplexing Dir connection). - Make restore use the in-use volume reservation algorithm. -- Look at mincore: http://insights.oetiker.ch/linux/fadvise.html -- Unicode input http://en.wikipedia.org/wiki/Byte_Order_Mark -- Add TLS to bat (should be done). - When Pool specifies Storage command override does not work. - Implement wait_for_sysop() message display in wait_for_device(), which now prints warnings too often. - Ensure that each device in an Autochanger has a different Device Index. -- Add Catalog = to Pool resource so that pools will exist - in only one catalog -- currently Pools are "global". - Look at sg_logs -a /dev/sg0 for getting soft errors. - btape "test" command with Offline on Unmount = yes @@ -131,7 +215,6 @@ Priority: > configuration string value to a CRYPTO_CIPHER_* value, if anyone is > interested in implementing this functionality. -- Why doesn't @"xxx abc" work in a conf file? - Figure out some way to "automatically" backup conf changes. - Add the OS version back to the Win32 client info. - Restarted jobs have a NULL in the from field. @@ -141,8 +224,6 @@ Priority: and possibly changing the blobs into varchar. - Ensure that the SD re-reads the Media record if the JobFiles does not match -- it may have been updated by another job. -- Look at moving the Storage directive from the Job to the - Pool in the default conf files. - Doc items - Test Volume compatibility between machine architectures - Encryption documentation @@ -203,9 +284,6 @@ Projects: For next release: - Try to fix bscan not working with multiple DVD volumes bug #912. - Look at mondo/mindi -- Don't restore Solaris Door files: - #define S_IFDOOR in st_mode. - see: http://docs.sun.com/app/docs/doc/816-5173/6mbb8ae23?a=view#indexterm-360 - Make Bacula by default not backup tmpfs, procfs, sysfs, ... - Fix hardlinked immutable files when linking a second file, the immutable flag must be removed prior to trying to link it. @@ -262,18 +340,6 @@ Low priority: http://linuxwiki.de/Bacula (in German) - Possibly allow SD to spool even if a tape is not mounted. -- It appears to me that you have run into some sort of race - condition where two threads want to use the same Volume and they - were both given access. Normally that is no problem. However, - one thread wanted the particular Volume in drive 0, but it was - loaded into drive 1 so it decided to unload it from drive 1 and - then loaded it into drive 0, while the second thread went on - thinking that the Volume could be used in drive 1 not realizing - that in between time, it was loaded in drive 0. - I'll look at the code to see if there is some way we can avoid - this kind of problem. Probably the best solution is to make the - first thread simply start using the Volume in drive 1 rather than - transferring it to drive 0. - Fix re-read of last block to check if job has actually written a block, and check if block was written by a different job (i.e. multiple simultaneous jobs writing). @@ -344,70 +410,6 @@ select Path.Path from Path,File where File.JobId=nnn and - Look into replacing autotools with cmake http://www.cmake.org/HTML/Index.html -=== Migration from David === -What I'd like to see: - -Job { - Name = "-migrate" - Type = Migrate - Messages = Standard - Pool = Default - Migration Selection Type = LowestUtil | OldestVol | PoolOccupancy | -Client | PoolResidence | Volume | JobName | SQLquery - Migration Selection Pattern = "regexp" - Next Pool = -} - -There should be no need for a Level (migration is always Full, since you -don't calculate differential/incremental differences for migration), -Storage should be determined by the volume types in the pool, and Client -is really a selection issue. Migration should always occur to the -NextPool defined in the pool definition. If no nextpool is defined, the -job should end with a reason of "no place to go". If Next Pool statement -is present, we override the check in the pool definition and use the -pool specified. - -Here's how I'd define Migration Selection Types: - -With Regexes: -Client -- Migrate data from selected client only. Migration Selection -Pattern regexp provides pattern to select client names, eg ^FS00* makes -all client names starting with FS00 eligible for migration. - -Jobname -- Migration all jobs matching name. Migration Selection Pattern -regexp provides pattern to select jobnames existing in pool. - -Volume -- Migrate all data on specified volumes. Migration Selection -Pattern regexp provides selection criteria for volumes to be migrated. -Volumes must exist in pool to be eligible for migration. - - -With Regex optional: -LowestUtil -- Identify the volume in the pool with the least data on it -and empty it. No Migration Selection Pattern required. - -OldestVol -- Identify the LRU volume with data written, and empty it. No -Migration Selection Pattern required. - -PoolOccupancy -- if pool occupancy exceeds , migrate volumes -(starting with most full volumes) until pool occupancy drops below -. Pool highmig and lowmig values are in pool definition, no -Migration Selection Pattern required. - - -No regex: -SQLQuery -- Migrate all jobuids returned by the supplied SQL query. -Migration Selection Pattern contains SQL query to execute; should return -a list of 1 or more jobuids to migrate. - -PoolResidence -- Migrate data sitting in pool for longer than -PoolResidence value in pool definition. Migration Selection Pattern -optional; if specified, override value in pool definition (value in -minutes). - - -[ possibly a Python event -- kes ] -=== - Mount on an Autochanger with no tape in the drive causes: Automatically selected Storage: LTO-changer Enter autochanger drive[0]: 0 @@ -556,20 +558,12 @@ minutes). ("D","Diff"), ("I","Inc"); - Show files/second in client status output. -- Add a recursive mark command (rmark) to restore. -- "Minimum Job Interval = nnn" sets minimum interval between Jobs - of the same level and does not permit multiple simultaneous - running of that Job (i.e. lets any previous invocation finish - before doing Interval testing). -- Look at simplifying File exclusions. -- New directive "Delete purged Volumes" - new pool XXX with ScratchPoolId = MyScratchPool's PoolId and let it fill itself, and RecyclePoolId = XXX's PoolId so I can see if it become stable and I just have to supervise MyScratchPool - If I want to remove this pool, I set RecyclePoolId = MyScratchPool's PoolId, and when it is empty remove it. -- Figure out how to recycle Scratch volumes back to the Scratch Pool. - Add Volume=SCRTCH - Allow Check Labels to be used with Bacula labels. - "Resuming" a failed backup (lost line for example) by using the @@ -601,8 +595,6 @@ minutes). backups of the same client and if we again try to start a full backup of client backup abc bacula won't complain. That should be fixed. -- Fix bpipe.c so that it does not modify results pointer. - ***FIXME*** calling sequence should be changed. - For Windows disaster recovery see http://unattended.sf.net/ - regardless of the retention period, Bacula will not prune the last Full, Diff, or Inc File data until a month after the @@ -632,11 +624,8 @@ minutes). - In restore don't compare byte count on a raw device -- directory entry does not contain bytes. -=== rate design - jcr->last_rate - jcr->last_runtime - MA = (last_MA * 3 + rate) / 4 - rate = (bytes - last_bytes) / (runtime - last_runtime) + + - Max Vols limit in Pool off by one? - Implement Files/Bytes,... stats for restore job. - Implement Total Bytes Written, ... for restore job. @@ -751,6 +740,77 @@ minutes). - It remains to be seen how the backup performance of the DIR's will be affected when comparing the catalog for a large filesystem. + 1. Use the current Director in-memory tree code (very fast), but currently in + memory. It probably could be paged. + + 2. Use some DB such as Berkeley DB or SQLite. SQLite is already compiled and + built for Win32, and it is something we could compile into the program. + + 3. Implement our own custom DB code. + + Note, by appropriate use of Directives in the Director, we can dynamically + decide if the work is done in the Director or in the FD, and we can even + allow the user to choose. + +=== most recent accurate file backup/restore === + Here is a sketch (i.e. more details must be filled in later) that I recently + made of an algorithm for doing Accurate Backup. + + 1. Dir informs FD that it is doing an Accurate backup and lookup done by + Director. + + 2. FD passes through the file system doing a normal backup based on normal + conditions, recording the names of all files and their attributes, and + indicating which files were backed up. This is very similar to what Verify + does. + + 3. The Director receives the two lists of files at the end of the FD backup. + One, files backed up, and one files not backed up. It then looks up all the + files not backed up (using Verify style code). + + 4. The Dir sends the FD a list of: + a. Additional files to backup (based on user specified criteria, name, size + inode date, hash, ...). + b. Files to delete. + + 5. Dir deletes list of file not backed up. + + 6. FD backs up additional files generates a list of those backed up and sends + it to the Director, which adds it to the list of files backed up. The list + is now complete and current. + + 7. The FD generates delete records for all the files that were deleted and + sends to the SD. + + 8. The Dir deletes the previous CurrentBackup list, and then does a + transaction insert of the new list that it has. + + 9. The rest works as before ... + + That is it. + + Two new tables needed. + 1. CurrentBackupId table that contains Client, JobName, FileSet, and a unique + BackupId. This is created during a Full save, and the BackupId can be set to + the JobId of the Full save. It will remain the same until another Full + backup is done. That is when new records are added during a Differential or + Incremental, they must use the same BackupId. + + 2. CurrentBackup table that contains essentially a File record (less a number + of fields, but with a few extra fields) -- e.g. a flag that the File was + backed up by a Full save (this permits doing a Differential). The unique + BackupId allows us to look up the CurrentBackup for a particular Client, + Jobname, FileSet using that unique BackupId as the key, so this table must be + indexed by the BackupId. + + Note any time a file is saved by the FD other than during a Full save, the + Full save flag is cleared. When doing a Differential backup, if a file has + the Full save flag set, it is skipped, otherwise it is backed up. For an + Incremental backup, we check to see if the file has changed since the last + time we backed it up. + + Deleted files should have FileIndex == 0 + ==== From David: How about introducing a Type = MgmtPolicy job type? That job type would @@ -909,18 +969,6 @@ Why: format string. Then I have the tape labeled automatically with weekday name in the correct language. ========== -- Yes, that is surely the case. I probably should turn those into Warning - errors. In addition, you just made me think that it might not be bad to - add an option to check the file size after backing up the file and - report if it changes. This would be done as an option because it would - add extra overhead. - - Kern, good idea. If you do do that, mention in the output: file - shrunk, or file expanded, just to make it obvious to the user - (without having to the refer to file size), just how the file size - changed. - - Would this option be for all file, or just one file? Or a fileset? - Make output from status use html table tags for nicely presenting in a browser. - Can one write tapes faster with 8192 byte block sizes? @@ -1040,10 +1088,6 @@ Documentation to do: (any release a little bit at a time) -> maybe its more easy to maintain this, if the descriptions of that commands are outsourced to a ceratin-file -- the cd-command should allow complete paths - i.e. cd /foo/bar/foo/bar - -> if a customer mails me the path to a certain file, - its faster to enter the specified directory - if the password is not configured in bconsole.conf you should be asked for it. -> sometimes you like to do restore on a customer-machine @@ -1246,8 +1290,6 @@ Documentation to do: (any release a little bit at a time) to start a job or pass its DHCP obtained IP number. - Implement a query tape prompt/replace feature for a console - Copy console @ code to gnome2-console -- Make tree walk routines like cd, ls, ... more user friendly - by handling spaces better. - Make sure that Bacula rechecks the tape after the 20 min wait. - Set IO_NOWAIT on Bacula TCP/IP packets. - Try doing a raw partition backup and restore by mounting a @@ -1326,7 +1368,6 @@ Documentation to do: (any release a little bit at a time) - Have SD compute MD5 or SHA1 and compare to what FD computes. - Make VolumeToCatalog calculate an MD5 or SHA1 from the actual data on the Volume and compare it. -- Implement Bacula plugins -- design API - Make bcopy read through bad tape records. - Program files (i.e. execute a program to read/write files). Pass read date of last backup, size of file last time. @@ -1566,9 +1607,6 @@ Longer term to do: - Enhance time/duration input to allow multiple qualifiers e.g. 3d2h - Add ability to backup to two Storage devices (two SD sessions) at the same time -- e.g. onsite, offsite. -- Add the ability to consolidate old backup sets (basically do a restore - to tape and appropriately update the catalog). Compress Volume sets. - Might need to spool via file is only one drive is available. - Compress or consolidate Volumes of old possibly deleted files. Perhaps someway to do so with every volume that has less than x% valid files. @@ -1631,22 +1669,6 @@ Need: ========================================================= -========================================================== - Unsaved File design -For each Incremental job that is run, there may be files that -were found but not saved because they were locked (this applies -only to Windows). Such a system could send back to the Director -a list of Unsaved files. -Need: -- New UnSavedFiles table that contains: - JobId - PathId - FilenameId -- Then in the next Incremental job, the list of Unsaved Files will be - feed to the FD, who will ensure that they are explicitly chosen even - if standard date/time check would not have selected them. -============================================================= - ===== Multiple drive autochanger data: see Alan Brown @@ -1808,3 +1830,104 @@ Block Position: 0 does the right thing. - FD-SD quick disconnect - Building the in memory restore tree is slow. +- Erabt if min_block_size > max_block_size +- Add the ability to consolidate old backup sets (basically do a restore + to tape and appropriately update the catalog). Compress Volume sets. + Might need to spool via file is only one drive is available. +- Why doesn't @"xxx abc" work in a conf file? +- Don't restore Solaris Door files: + #define S_IFDOOR in st_mode. + see: http://docs.sun.com/app/docs/doc/816-5173/6mbb8ae23?a=view#indexterm-360 +- Figure out how to recycle Scratch volumes back to the Scratch Pool. +- Implement Despooling data status. +- Use E'xxx' to escape PostgreSQL strings. +- Look at mincore: http://insights.oetiker.ch/linux/fadvise.html +- Unicode input http://en.wikipedia.org/wiki/Byte_Order_Mark +- Look at moving the Storage directive from the Job to the + Pool in the default conf files. +- Look at in src/filed/backup.c +> pm_strcpy(ff_pkt->fname, ff_pkt->fname_save); +> pm_strcpy(ff_pkt->link, ff_pkt->link_save); +- Add Catalog = to Pool resource so that pools will exist + in only one catalog -- currently Pools are "global". +- Add TLS to bat (should be done). +=== Duplicate jobs === +- Done, but implemented somewhat differently than described below!!! + + hese apply only to backup jobs. + + 1. Allow Duplicate Jobs = Yes | No | Higher (Yes) + + 2. Duplicate Job Interval = (0) + + The defaults are in parenthesis and would produce the same behavior as today. + + If Allow Duplicate Jobs is set to No, then any job starting while a job of the + same name is running will be canceled. + + If Allow Duplicate Jobs is set to Higher, then any job starting with the same + or lower level will be canceled, but any job with a Higher level will start. + The Levels are from High to Low: Full, Differential, Incremental + + Finally, if you have Duplicate Job Interval set to a non-zero value, any job + of the same name which starts after a previous job of the + same name would run, any one that starts within would be + subject to the above rules. Another way of looking at it is that the Allow + Duplicate Jobs directive will only apply after of when the + previous job finished (i.e. it is the minimum interval between jobs). + + So in summary: + + Allow Duplicate Jobs = Yes | No | HigherLevel | CancelLowerLevel (Yes) + + Where HigherLevel cancels any waiting job but not any running job. + Where CancelLowerLevel is same as HigherLevel but cancels any running job or + waiting job. + + Duplicate Job Proximity = (0) + + My suggestion was to define it as the minimum guard time between + executions of a specific job -- ie, if a job was scheduled within Job + Proximity number of seconds, it would be considered a duplicate and + consolidated. + + Skip = Do not allow two or more jobs with the same name to run + simultaneously within the proximity interval. The second and subsequent + jobs are skipped without further processing (other than to note the job + and exit immediately), and are not considered errors. + + Fail = The second and subsequent jobs that attempt to run during the + proximity interval are cancelled and treated as error-terminated jobs. + + Promote = If a job is running, and a second/subsequent job of higher + level attempts to start, the running job is promoted to the higher level + of processing using the resources already allocated, and the subsequent + job is treated as in Skip above. + + +DuplicateJobs { + Name = "xxx" + Description = "xxx" + Allow = yes|no (no = default) + + AllowHigherLevel = yes|no (no) + + AllowLowerLevel = yes|no (no) + + AllowSameLevel = yes|no + + Cancel = Running | New (no) + + CancelledStatus = Fail | Skip (fail) + + Job Proximity = (0) + My suggestion was to define it as the minimum guard time between + executions of a specific job -- ie, if a job was scheduled within Job + Proximity number of seconds, it would be considered a duplicate and + consolidated. + +} + +=== +- Fix bpipe.c so that it does not modify results pointer. + ***FIXME*** calling sequence should be changed.