Kern's ToDo List
- 17 February 2006
+ 20 June 2006
Major development:
Project Developer
======= =========
Document:
-- Does ClientRunAfterJob fail the job on a bad return code?
- Document cleaning up the spool files:
db, pid, state, bsr, mail, conmsg, spool
- Document the multiple-drive-changer.txt script.
- Pruning with Admin job.
- Does WildFile match against full name? Doc.
- %d and %v only valid on Director, not for ClientRunBefore/After.
+- During tests with the 260 char fix code, I found one problem:
+ if the system "sees" a long path once, it seems to forget it's
+ working drive (e.g. c:\), which will lead to a problem during
+ the next job (create bootstrap file will fail). Here is the
+ workaround: specify absolute working and pid directory in
+ bacula-fd.conf (e.g. c:\bacula\working instead of
+ \bacula\working).
Priority:
-- Implement code that makes the Dir aware that a drive is an
- autochanger (so the user doesn't need to use the Autochanger = yes
- directive).
For 1.39:
-- Keep same dcr when switching device ...
+- Make authentication failures single threaded.
+- Make base64.c (bin_to_base64) take a buffer length
+ argument to avoid overruns.
+- Fix catreq.c digestbuf at line 411 in src/dird/catreq.c
+ and verify that other buffers cannot overrun.
+- Implement VolumeState as discussed with Arno.
+- Install man pages
+- Document techniques for restoring large numbers of files.
+- Document setting my.cnf to big file usage.
+- Add example of proper index output to doc.
+ show index from File;
+- Fix re-read of last block to check if job has actually written
+ a block, and check if block was written by a different job
+ (i.e. multiple simultaneous jobs writing).
+- JobStatus and Termination codes.
+- Some users claim that they must do two prune commands to get a
+ Volume marked as purged.
+- Print warning message if LANG environment variable does not specify
+ UTF-8.
+=== Migration from David ===
+What I'd like to see:
+
+Job {
+ Name = "<poolname>-migrate"
+ Type = Migrate
+ Messages = Standard
+ Pool = Default
+ Migration Selection Type = LowestUtil | OldestVol | PoolOccupancy |
+Client | PoolResidence | Volume | JobName | SQLquery
+ Migration Selection Pattern = "regexp"
+ Next Pool = <override>
+}
+
+There should be no need for a Level (migration is always Full, since you
+don't calculate differential/incremental differences for migration),
+Storage should be determined by the volume types in the pool, and Client
+is really a selection issue. Migration should always occur to the
+NextPool defined in the pool definition. If no nextpool is defined, the
+job should end with a reason of "no place to go". If Next Pool statement
+is present, we override the check in the pool definition and use the
+pool specified.
+
+Here's how I'd define Migration Selection Types:
+
+With Regexes:
+Client -- Migrate data from selected client only. Migration Selection
+Pattern regexp provides pattern to select client names, eg ^FS00* makes
+all client names starting with FS00 eligible for migration.
+
+Jobname -- Migration all jobs matching name. Migration Selection Pattern
+regexp provides pattern to select jobnames existing in pool.
+
+Volume -- Migrate all data on specified volumes. Migration Selection
+Pattern regexp provides selection criteria for volumes to be migrated.
+Volumes must exist in pool to be eligible for migration.
+
+
+With Regex optional:
+LowestUtil -- Identify the volume in the pool with the least data on it
+and empty it. No Migration Selection Pattern required.
+
+OldestVol -- Identify the LRU volume with data written, and empty it. No
+Migration Selection Pattern required.
+
+PoolOccupancy -- if pool occupancy exceeds <highmig>, migrate volumes
+(starting with most full volumes) until pool occupancy drops below
+<lowmig>. Pool highmig and lowmig values are in pool definition, no
+Migration Selection Pattern required.
+
+
+No regex:
+SQLQuery -- Migrate all jobuids returned by the supplied SQL query.
+Migration Selection Pattern contains SQL query to execute; should return
+a list of 1 or more jobuids to migrate.
+
+PoolResidence -- Migrate data sitting in pool for longer than
+PoolResidence value in pool definition. Migration Selection Pattern
+optional; if specified, override value in pool definition (value in
+minutes).
+
+
+[ possibly a Python event -- kes ]
+===
+- Mount on an Autochanger with no tape in the drive causes:
+ Automatically selected Storage: LTO-changer
+ Enter autochanger drive[0]: 0
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because:
+ Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found.
+ 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted.
+ If this is not a blank tape, try unmounting and remounting the Volume.
+- If Drive 0 is blocked, and drive 1 is set "Autoselect=no", drive 1 will
+ be used.
+- Autochanger did not change volumes.
+ select * from Storage;
+ +-----------+-------------+-------------+
+ | StorageId | Name | AutoChanger |
+ +-----------+-------------+-------------+
+ | 1 | LTO-changer | 0 |
+ +-----------+-------------+-------------+
+ 05-May 03:50 roxie-sd: 3302 Autochanger "loaded drive 0", result is Slot 11.
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Warning: Director wanted Volume "LT
+ Current Volume "LT0-002" not acceptable because:
+ 1997 Volume "LT0-002" not in catalog.
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Error: Autochanger Volume "LT0-002"
+ Setting InChanger to zero in catalog.
+ 05-May 03:50 roxie-dir: Tibs.2006-05-05_03.05.02 Error: Unable to get Media record
+
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Error getting Volume i
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Job 530 canceled.
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: spool.c:249 Fatal appe
+ 05-May 03:49 Tibs: Tibs.2006-05-05_03.05.02 Fatal error: c:\cygwin\home\kern\bacula
+ , got
+ (missing)
+ llist volume=LTO-002
+ MediaId: 6
+ VolumeName: LTO-002
+ Slot: 0
+ PoolId: 1
+ MediaType: LTO-2
+ FirstWritten: 2006-05-05 03:11:54
+ LastWritten: 2006-05-05 03:50:23
+ LabelDate: 2005-12-26 16:52:40
+ VolJobs: 1
+ VolFiles: 0
+ VolBlocks: 1
+ VolMounts: 0
+ VolBytes: 206
+ VolErrors: 0
+ VolWrites: 0
+ VolCapacityBytes: 0
+ VolStatus:
+ Recycle: 1
+ VolRetention: 31,536,000
+ VolUseDuration: 0
+ MaxVolJobs: 0
+ MaxVolFiles: 0
+ MaxVolBytes: 0
+ InChanger: 0
+ EndFile: 0
+ EndBlock: 0
+ VolParts: 0
+ LabelType: 0
+ StorageId: 1
+
+ Note VolStatus is blank!!!!!
+ llist volume=LTO-003
+ MediaId: 7
+ VolumeName: LTO-003
+ Slot: 12
+ PoolId: 1
+ MediaType: LTO-2
+ FirstWritten: 0000-00-00 00:00:00
+ LastWritten: 0000-00-00 00:00:00
+ LabelDate: 2005-12-26 16:52:40
+ VolJobs: 0
+ VolFiles: 0
+ VolBlocks: 0
+ VolMounts: 0
+ VolBytes: 1
+ VolErrors: 0
+ VolWrites: 0
+ VolCapacityBytes: 0
+ VolStatus: Append
+ Recycle: 1
+ VolRetention: 31,536,000
+ VolUseDuration: 0
+ MaxVolJobs: 0
+ MaxVolFiles: 0
+ MaxVolBytes: 0
+ InChanger: 0
+ EndFile: 0
+ EndBlock: 0
+ VolParts: 0
+ LabelType: 0
+ StorageId: 1
+===
+ mount
+ Automatically selected Storage: LTO-changer
+ Enter autochanger drive[0]: 0
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because:
+ Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found.
+
+ 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted.
+ If this is not a blank tape, try unmounting and remounting the Volume.
+
+- Add VolumeState (enable, disable, archive)
+- Add VolumeLock to prevent all but lock holder (SD) from updating
+ the Volume data (with the exception of VolumeState).
+- The btape fill command does not seem to use the Autochanger
+- Make Windows installer default to system disk drive.
+- Look at using ioctl(FIOBMAP, ...) on Linux, and
+ DeviceIoControl(..., FSCTL_QUERY_ALLOCATED_RANGES, ...) on
+ Win32 for sparse files.
+ http://www.flexhex.com/docs/articles/sparse-files.phtml
+ http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html
+- Directive: at <event> "command"
+- Command: pycmd "command" generates "command" event. How to
+ attach to a specific job?
+- Integrate Christopher's St. Bernard code.
+- run_cmd() returns int should return JobId_t
+- get_next_jobid_from_list() returns int should return JobId_t
+- Document export LDFLAGS=-L/usr/lib64
+- Don't attempt to restore from "Disabled" Volumes.
+- Network error on Win32 should set Win32 error code.
+- What happens when you rename a Disk Volume?
- Job retention period in a Pool (and hence Volume). The job would
then be migrated.
- Detect resource deadlock in Migrate when same job wants to read
and write the same device.
-- Make hardlink code at line 240 of find_one.c use binary search.
- Queue warning/error messages during restore so that they
are reported at the end of the report rather than being
hidden in the file listing ...
-- Fix Maximum Changer Wait (and others) to accept qualifiers.
- Look at -D_FORTIFY_SOURCE=2
- Add Win32 FileSet definition somewhere
- Look at fixing restore status stats in SD.
- Make selection of Database used in restore correspond to
client.
+- Look at using ioctl(FIMAP) and FIGETBSZ for sparse files.
+ http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html
- Implement a mode that says when a hard read error is
encountered, read many times (as it currently does), and if the
block cannot be read, skip to the next block, and try again. If
("D","Diff"),
("I","Inc");
- Add ACL to restore only to original location.
+- Show files/second in client status output.
- Add a recursive mark command (rmark) to restore.
- "Minimum Job Interval = nnn" sets minimum interval between Jobs
of the same level and does not permit multiple simultaneous
running of that Job (i.e. lets any previous invocation finish
before doing Interval testing).
- Look at simplifying File exclusions.
-- Fix store_yesno to be store_bitmask.
- New directive "Delete purged Volumes"
- new pool XXX with ScratchPoolId = MyScratchPool's PoolId and
let it fill itself, and RecyclePoolId = XXX's PoolId so I can
MyScratchPool
- If I want to remove this pool, I set RecyclePoolId = MyScratchPool's
PoolId, and when it is empty remove it.
-- Figure out how to recycle Scratch volumes back to the Scratch
- Pool.
+- Figure out how to recycle Scratch volumes back to the Scratch Pool.
- Add Volume=SCRTCH
- Allow Check Labels to be used with Bacula labels.
- "Resuming" a failed backup (lost line for example) by using the
days before it needs changing.
- Command to show next tape that will be used for a job even
if the job is not scheduled.
---- create_file.c.orig Fri Jul 8 12:13:05 2005
-+++ create_file.c Fri Jul 8 12:13:07 2005
-@@ -195,6 +195,8 @@
- attr->ofname, be.strerror());
- return CF_ERROR;
- }
-+ } else if(S_ISSOCK(attr->statp.st_mode)) {
-+ Dmsg1(200, "Skipping socket: %s\n", attr->ofname);
- } else {
- Dmsg1(200, "Restore node: %s\n", attr->ofname);
- if (mknod(attr->ofname, attr->statp.st_mode, attr->statp.st_rdev) != 0 && errno != EEXIST) {
- From: Arunav Mandal <amandal@trolltech.com>
1. When jobs are running and bacula for some reason crashes or if I do a
restart it remembers and jobs it was running before it crashed or restarted
- Fix bpipe.c so that it does not modify results pointer.
***FIXME*** calling sequence should be changed.
-1.xx Major Projects:
-#3 Migration (Move, Copy, Archive Jobs)
-#7 Single Job Writing to Multiple Storage Devices
-- Reserve blocks other restore jobs when first cannot connect
- to SD.
-- Add true/false to conf same as yes/no
- For Windows disaster recovery see http://unattended.sf.net/
- regardless of the retention period, Bacula will not prune the
last Full, Diff, or Inc File data until a month after the
- It remains to be seen how the backup performance of the DIR's will be
affected when comparing the catalog for a large filesystem.
+====
+From David:
+How about introducing a Type = MgmtPolicy job type? That job type would
+be responsible for scanning the Bacula environment looking for specific
+conditions, and submitting the appropriate jobs for implementing said
+policy, eg:
+
+Job {
+ Name = "Migration-Policy"
+ Type = MgmtPolicy
+ Policy Selection Job Type = Migrate
+ Scope = "<keyword> <operator> <regexp>"
+ Threshold = "<keyword> <operator> <regexp>"
+ Job Template = <template-name>
+}
+
+Where <keyword> is any legal job keyword, <operator> is a comparison
+operator (=,<,>,!=, logical operators AND/OR/NOT) and <regexp> is a
+appropriate regexp. I could see an argument for Scope and Threshold
+being SQL queries if we want to support full flexibility. The
+Migration-Policy job would then get scheduled as frequently as a site
+felt necessary (suggested default: every 15 minutes).
+
+Example:
+
+Job {
+ Name = "Migration-Policy"
+ Type = MgmtPolicy
+ Policy Selection Job Type = Migration
+ Scope = "Pool=*"
+ Threshold = "Migration Selection Type = LowestUtil"
+ Job Template = "MigrationTemplate"
+}
+
+would select all pools for examination and generate a job based on
+MigrationTemplate to automatically select the volume with the lowest
+usage and migrate it's contents to the nextpool defined for that pool.
+
+This policy abstraction would be really handy for adjusting the behavior
+of Bacula according to site-selectable criteria (one thing that pops
+into mind is Amanda's ability to automatically adjust backup levels
+depending on various criteria).
+
+
=====
Regression tests:
even defined).
- Make sure Maximum Volumes is respected in Pools when adding
Volumes (e.g. when pulling a Scratch volume).
+- Keep same dcr when switching device ...
+- Implement code that makes the Dir aware that a drive is an
+ autochanger (so the user doesn't need to use the Autochanger = yes
+ directive).
+- Make catalog respect ACL.
+- Add recycle count to Media record.
+- Add initial write date to Media record.
+- Fix store_yesno to be store_bitmask.
+--- create_file.c.orig Fri Jul 8 12:13:05 2005
++++ create_file.c Fri Jul 8 12:13:07 2005
+@@ -195,6 +195,8 @@
+ attr->ofname, be.strerror());
+ return CF_ERROR;
+ }
++ } else if(S_ISSOCK(attr->statp.st_mode)) {
++ Dmsg1(200, "Skipping socket: %s\n", attr->ofname);
+ } else {
+ Dmsg1(200, "Restore node: %s\n", attr->ofname);
+ if (mknod(attr->ofname, attr->statp.st_mode, attr->statp.st_rdev) != 0 && errno != EEXIST) {
+- Add true/false to conf same as yes/no
+- Reserve blocks other restore jobs when first cannot connect to SD.
+- Fix Maximum Changer Wait, Maximum Open Wait, Maximum Rewind Wait to
+ accept time qualifiers.
+- Does ClientRunAfterJob fail the job on a bad return code?
+- Make hardlink code at line 240 of find_one.c use binary search.
+- Add ACL error messages in src/filed/acl.c.