X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fkernstodo;h=bf8f31e598a1ff6c9154b216f7ec48b765c7c6a8;hb=200821d1ecedf9bc34de69a1ab33f937cebb4972;hp=0493eebda24059d1556557d2330864bf83a03697;hpb=5386ad776dd0af368a89e135079a93897ef655bf;p=bacula%2Fbacula diff --git a/bacula/kernstodo b/bacula/kernstodo index 0493eebda2..bf8f31e598 100644 --- a/bacula/kernstodo +++ b/bacula/kernstodo @@ -1,27 +1,322 @@ Kern's ToDo List - 22 February 2006 + 19 August 2006 Major development: Project Developer ======= ========= Document: -- Does ClientRunAfterJob fail the job on a bad return code? - Document cleaning up the spool files: db, pid, state, bsr, mail, conmsg, spool - Document the multiple-drive-changer.txt script. - Pruning with Admin job. - Does WildFile match against full name? Doc. - %d and %v only valid on Director, not for ClientRunBefore/After. +- During tests with the 260 char fix code, I found one problem: + if the system "sees" a long path once, it seems to forget it's + working drive (e.g. c:\), which will lead to a problem during + the next job (create bootstrap file will fail). Here is the + workaround: specify absolute working and pid directory in + bacula-fd.conf (e.g. c:\bacula\working instead of + \bacula\working). +- Document techniques for restoring large numbers of files. +- Document setting my.cnf to big file usage. +- Add example of proper index output to doc. + show index from File; +- Correct the Include syntax in the m4.xxx files in examples/conf +- Document JobStatus and Termination codes. +- Fix the error with the "DVI file can't be opened" while + building the French PDF. +- Document more DVD stuff -- particularly that recycling doesn't work, + and all the other things too. Priority: For 1.39: +- Fix wx-console scanning problem with commas in names. +- Change dbcheck to tell users to use native tools for fixing + broken databases, and to ensure they have the proper indexes. +- add udev rules for Bacula devices. +- Add manpages to the list of directories for make install. +- If a job terminates, the DIR connection can close before the + Volume info is updated, leaving the File count wrong. +- Look at why SIGPIPE during connection can cause seg fault in + writing the daemon message, when Dir dropped to bacula:bacula +- Look at zlib 32 => 64 problems. +- Ensure that connection to daemon failure always indicates what + daemon it was trying to connect to. +- Try turning on disk seek code. +- Possibly turn on St. Bernard code. +- Fix bextract to restore ACLs, or better yet, use common + routines. +- Do we migrate appendable Volumes? +- Remove queue.c code. +- Add bconsole option to use stdin/out instead of conio. +- Fix ClientRunBefore/AfterJob compatibility. +- Fix re-read of last block to check if job has actually written + a block, and check if block was written by a different job + (i.e. multiple simultaneous jobs writing). +- Some users claim that they must do two prune commands to get a + Volume marked as purged. +- Print warning message if LANG environment variable does not specify + UTF-8. +- New dot commands from Arno. + .update volume [enabled|disabled|*see below] + > However, I could easily imagine an option to "update slots" that says + > "enable=yes|no" that would automatically enable or disable all the Volumes + > found in the autochanger. This will permit the user to optionally mark all + > the Volumes in the magazine disabled prior to taking them offsite, and mark + > them all enabled when bringing them back on site. Coupled with the options + > to the slots keyword, you can apply the enable/disable to any or all volumes. + .show device=xxx lists information from one storage device, including + devices (I'm not even sure that information exists in the DIR...) + .move eject device=xxx mostly the same as 'unmount xxx' but perhaps with + better machine-readable output like "Ok" or "Error busy" + .move eject device=xxx toslot=yyy the same as above, but with a new + target slot. The catalog should be updated accordingly. + .move transfer device=xxx fromslot=yyy toslot=zzz + +Low priority: +- Get Perl replacement for bregex.c +- Given all the problems with FIFOs, I think the solution is to do something a + little different, though I will look at the code and see if there is not some + simple solution (i.e. some bug that was introduced). What might be a better + solution would be to use a FIFO as a sort of "key" to tell Bacula to read and + write data to a program rather than the FIFO. For example, suppose you + create a FIFO named: + + /home/kern/my-fifo + + Then, I could imagine if you backup and restore this file with a direct + reference as is currently done for fifos, instead, during backup Bacula will + execute: + + /home/kern/my-fifo.backup + + and read the data that my-fifo.backup writes to stdout. For restore, Bacula + will execute: + + /home/kern/my-fifo.restore + + and send the data backed up to stdout. These programs can either be an + executable or a shell script and they need only read/write to stdin/stdout. + + I think this would give a lot of flexibility to the user without making any + significant changes to Bacula. + + +==== SQL +# get null file +select FilenameId from Filename where Name=''; +# Get list of all directories referenced in a Backup. +select Path.Path from Path,File where File.JobId=nnn and + File.FilenameId=(FilenameId-from-above) and File.PathId=Path.PathId + order by Path.Path ASC; + +- Look into using Dart for testing + http://public.kitware.com/Dart/HTML/Index.shtml + +- Look into replacing autotools with cmake + http://www.cmake.org/HTML/Index.html + +=== Migration from David === +What I'd like to see: + +Job { + Name = "-migrate" + Type = Migrate + Messages = Standard + Pool = Default + Migration Selection Type = LowestUtil | OldestVol | PoolOccupancy | +Client | PoolResidence | Volume | JobName | SQLquery + Migration Selection Pattern = "regexp" + Next Pool = +} + +There should be no need for a Level (migration is always Full, since you +don't calculate differential/incremental differences for migration), +Storage should be determined by the volume types in the pool, and Client +is really a selection issue. Migration should always occur to the +NextPool defined in the pool definition. If no nextpool is defined, the +job should end with a reason of "no place to go". If Next Pool statement +is present, we override the check in the pool definition and use the +pool specified. + +Here's how I'd define Migration Selection Types: + +With Regexes: +Client -- Migrate data from selected client only. Migration Selection +Pattern regexp provides pattern to select client names, eg ^FS00* makes +all client names starting with FS00 eligible for migration. + +Jobname -- Migration all jobs matching name. Migration Selection Pattern +regexp provides pattern to select jobnames existing in pool. + +Volume -- Migrate all data on specified volumes. Migration Selection +Pattern regexp provides selection criteria for volumes to be migrated. +Volumes must exist in pool to be eligible for migration. + + +With Regex optional: +LowestUtil -- Identify the volume in the pool with the least data on it +and empty it. No Migration Selection Pattern required. + +OldestVol -- Identify the LRU volume with data written, and empty it. No +Migration Selection Pattern required. + +PoolOccupancy -- if pool occupancy exceeds , migrate volumes +(starting with most full volumes) until pool occupancy drops below +. Pool highmig and lowmig values are in pool definition, no +Migration Selection Pattern required. + + +No regex: +SQLQuery -- Migrate all jobuids returned by the supplied SQL query. +Migration Selection Pattern contains SQL query to execute; should return +a list of 1 or more jobuids to migrate. + +PoolResidence -- Migrate data sitting in pool for longer than +PoolResidence value in pool definition. Migration Selection Pattern +optional; if specified, override value in pool definition (value in +minutes). + + +[ possibly a Python event -- kes ] +=== +- Mount on an Autochanger with no tape in the drive causes: + Automatically selected Storage: LTO-changer + Enter autochanger drive[0]: 0 + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because: + Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found. + 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. + If this is not a blank tape, try unmounting and remounting the Volume. +- If Drive 0 is blocked, and drive 1 is set "Autoselect=no", drive 1 will + be used. +- Autochanger did not change volumes. + select * from Storage; + +-----------+-------------+-------------+ + | StorageId | Name | AutoChanger | + +-----------+-------------+-------------+ + | 1 | LTO-changer | 0 | + +-----------+-------------+-------------+ + 05-May 03:50 roxie-sd: 3302 Autochanger "loaded drive 0", result is Slot 11. + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Warning: Director wanted Volume "LT + Current Volume "LT0-002" not acceptable because: + 1997 Volume "LT0-002" not in catalog. + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Error: Autochanger Volume "LT0-002" + Setting InChanger to zero in catalog. + 05-May 03:50 roxie-dir: Tibs.2006-05-05_03.05.02 Error: Unable to get Media record + + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Error getting Volume i + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Job 530 canceled. + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: spool.c:249 Fatal appe + 05-May 03:49 Tibs: Tibs.2006-05-05_03.05.02 Fatal error: c:\cygwin\home\kern\bacula + , got + (missing) + llist volume=LTO-002 + MediaId: 6 + VolumeName: LTO-002 + Slot: 0 + PoolId: 1 + MediaType: LTO-2 + FirstWritten: 2006-05-05 03:11:54 + LastWritten: 2006-05-05 03:50:23 + LabelDate: 2005-12-26 16:52:40 + VolJobs: 1 + VolFiles: 0 + VolBlocks: 1 + VolMounts: 0 + VolBytes: 206 + VolErrors: 0 + VolWrites: 0 + VolCapacityBytes: 0 + VolStatus: + Recycle: 1 + VolRetention: 31,536,000 + VolUseDuration: 0 + MaxVolJobs: 0 + MaxVolFiles: 0 + MaxVolBytes: 0 + InChanger: 0 + EndFile: 0 + EndBlock: 0 + VolParts: 0 + LabelType: 0 + StorageId: 1 + + Note VolStatus is blank!!!!! + llist volume=LTO-003 + MediaId: 7 + VolumeName: LTO-003 + Slot: 12 + PoolId: 1 + MediaType: LTO-2 + FirstWritten: 0000-00-00 00:00:00 + LastWritten: 0000-00-00 00:00:00 + LabelDate: 2005-12-26 16:52:40 + VolJobs: 0 + VolFiles: 0 + VolBlocks: 0 + VolMounts: 0 + VolBytes: 1 + VolErrors: 0 + VolWrites: 0 + VolCapacityBytes: 0 + VolStatus: Append + Recycle: 1 + VolRetention: 31,536,000 + VolUseDuration: 0 + MaxVolJobs: 0 + MaxVolFiles: 0 + MaxVolBytes: 0 + InChanger: 0 + EndFile: 0 + EndBlock: 0 + VolParts: 0 + LabelType: 0 + StorageId: 1 +=== + mount + Automatically selected Storage: LTO-changer + Enter autochanger drive[0]: 0 + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because: + Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found. + + 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. + If this is not a blank tape, try unmounting and remounting the Volume. + +- Add VolumeState (enable, disable, archive) +- Add VolumeLock to prevent all but lock holder (SD) from updating + the Volume data (with the exception of VolumeState). +- The btape fill command does not seem to use the Autochanger +- Make Windows installer default to system disk drive. +- Look at using ioctl(FIOBMAP, ...) on Linux, and + DeviceIoControl(..., FSCTL_QUERY_ALLOCATED_RANGES, ...) on + Win32 for sparse files. + http://www.flexhex.com/docs/articles/sparse-files.phtml + http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html +- Directive: at "command" +- Command: pycmd "command" generates "command" event. How to + attach to a specific job? +- Integrate Christopher's St. Bernard code. +- run_cmd() returns int should return JobId_t +- get_next_jobid_from_list() returns int should return JobId_t +- Document export LDFLAGS=-L/usr/lib64 +- Don't attempt to restore from "Disabled" Volumes. +- Network error on Win32 should set Win32 error code. +- What happens when you rename a Disk Volume? - Job retention period in a Pool (and hence Volume). The job would then be migrated. - Detect resource deadlock in Migrate when same job wants to read and write the same device. -- Make hardlink code at line 240 of find_one.c use binary search. - Queue warning/error messages during restore so that they are reported at the end of the report rather than being hidden in the file listing ... @@ -30,6 +325,8 @@ For 1.39: - Look at fixing restore status stats in SD. - Make selection of Database used in restore correspond to client. +- Look at using ioctl(FIMAP) and FIGETBSZ for sparse files. + http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html - Implement a mode that says when a hard read error is encountered, read many times (as it currently does), and if the block cannot be read, skip to the next block, and try again. If @@ -41,6 +338,7 @@ For 1.39: ("D","Diff"), ("I","Inc"); - Add ACL to restore only to original location. +- Show files/second in client status output. - Add a recursive mark command (rmark) to restore. - "Minimum Job Interval = nnn" sets minimum interval between Jobs of the same level and does not permit multiple simultaneous @@ -222,6 +520,50 @@ For 1.39: - It remains to be seen how the backup performance of the DIR's will be affected when comparing the catalog for a large filesystem. +==== +From David: +How about introducing a Type = MgmtPolicy job type? That job type would +be responsible for scanning the Bacula environment looking for specific +conditions, and submitting the appropriate jobs for implementing said +policy, eg: + +Job { + Name = "Migration-Policy" + Type = MgmtPolicy + Policy Selection Job Type = Migrate + Scope = " " + Threshold = " " + Job Template = +} + +Where is any legal job keyword, is a comparison +operator (=,<,>,!=, logical operators AND/OR/NOT) and is a +appropriate regexp. I could see an argument for Scope and Threshold +being SQL queries if we want to support full flexibility. The +Migration-Policy job would then get scheduled as frequently as a site +felt necessary (suggested default: every 15 minutes). + +Example: + +Job { + Name = "Migration-Policy" + Type = MgmtPolicy + Policy Selection Job Type = Migration + Scope = "Pool=*" + Threshold = "Migration Selection Type = LowestUtil" + Job Template = "MigrationTemplate" +} + +would select all pools for examination and generate a job based on +MigrationTemplate to automatically select the volume with the lowest +usage and migrate it's contents to the nextpool defined for that pool. + +This policy abstraction would be really handy for adjusting the behavior +of Bacula according to site-selectable criteria (one thing that pops +into mind is Amanda's ability to automatically adjust backup levels +depending on various criteria). + + ===== Regression tests: @@ -1282,4 +1624,28 @@ Block Position: 0 - Reserve blocks other restore jobs when first cannot connect to SD. - Fix Maximum Changer Wait, Maximum Open Wait, Maximum Rewind Wait to accept time qualifiers. - +- Does ClientRunAfterJob fail the job on a bad return code? +- Make hardlink code at line 240 of find_one.c use binary search. +- Add ACL error messages in src/filed/acl.c. +- Make authentication failures single threaded. +- Make Dir and SD authentication errors single threaded. +- Install man pages +- Fix catreq.c digestbuf at line 411 in src/dird/catreq.c +- Make base64.c (bin_to_base64) take a buffer length + argument to avoid overruns. + and verify that other buffers cannot overrun. +- Implement VolumeState as discussed with Arno. +- Add LocationId to update volume +- Add LocationLog + LogId + Date + User text + MediaId + LocationId + NewState??? +- Add Comment to Media record +- Fix auth compatibility with 1.38 +- Update dbcheck to include Log table +- Update llist to include new fields. +- Make unmount unload autochanger. Make mount load slot. +- Fix bscan to report the JobType when restoring a job.