X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fkernstodo;h=cfa76f1b9acf9ef38a113cc09c7ea50ab11b214e;hb=fc92e04201e428fbf206dbd01518a02490ba50f9;hp=6387dc40acfb3d619d7a8a7b1a292258c71acf26;hpb=a22606258480f37a2ba6ce1e20a740cd8b2fc698;p=bacula%2Fbacula

diff --git a/bacula/kernstodo b/bacula/kernstodo
index 6387dc40ac..cfa76f1b9a 100644
--- a/bacula/kernstodo
+++ b/bacula/kernstodo
@@ -1,33 +1,178 @@
                     Kern's ToDo List
-                     16 June 2005
+                     22 February 2006
 
 Major development:      
 Project                     Developer
 =======                     =========                         
-TLS                         Landon Fuller
-Unicode in Win32            Thorsten Engel (done)
-VSS                         Thorsten Engel (in beta testing)
-Version 1.37                Kern (see below)
-========================================================
-
-1.37 Major Projects:
-#3   Migration (Move, Copy, Archive Jobs)
-     (probably not this version)
-#7   Single Job Writing to Multiple Storage Devices
-     (probably not this version)
-
-##   Create a new GUI chapter explaining all the GUI programs.
-
-Autochangers:
--    Make "update slots" when pointing to Autochanger, remove
-     all Volumes from other drives.  "update slots all-drives"?
-
-For 1.37:
-- Finish TLS implementation.
-- Fix PostgreSQL GROUP BY problems in restore.
-- Fix PostgreSQL sql problems in bugs.
+
+Document:
+- Document cleaning up the spool files:
+  db, pid, state, bsr, mail, conmsg, spool
+- Document the multiple-drive-changer.txt script.
+- Pruning with Admin job.
+- Does WildFile match against full name?  Doc.
+- %d and %v only valid on Director, not for ClientRunBefore/After.
+
+Priority:
+
+For 1.39:
+- Fix re-read of last block to check if job has actually written
+  a block, and check if block was written by a different job
+  (i.e. multiple simultaneous jobs writing).
+- JobStatus and Termination codes.
+- Some users claim that they must do two prune commands to get a
+  Volume marked as purged.
+- Print warning message if LANG environment variable does not specify
+  UTF-8.
+=== Migration from David ===
+What I'd like to see: 
+
+Job {
+  Name = "<poolname>-migrate"
+  Type = Migrate
+  Messages = Standard
+  Pool = Default
+  Migration Selection Type = LowestUtil | OldestVol | PoolOccupancy |
+Client | PoolResidence | Volume | JobName | SQLquery
+  Migration Selection Pattern = "regexp"
+  Next Pool = <override>
+}
+
+There should be no need for a Level (migration is always Full, since you
+don't calculate differential/incremental differences for migration),
+Storage should be determined by the volume types in the pool, and Client
+is really a selection issue.  Migration should always occur to the
+NextPool defined in the pool definition. If no nextpool is defined, the
+job should end with a reason of "no place to go". If Next Pool statement
+is present, we override the check in the pool definition and use the
+pool specified. 
+
+Here's how I'd define Migration Selection Types: 
+
+With Regexes:
+Client  -- Migrate data from selected client only. Migration Selection
+Pattern regexp provides pattern to select client names, eg ^FS00* makes
+all client names starting with FS00 eligible for migration. 
+
+Jobname -- Migration all jobs matching name. Migration Selection Pattern
+regexp provides pattern to select jobnames existing in pool. 
+
+Volume -- Migrate all data on specified volumes. Migration Selection
+Pattern regexp provides selection criteria for volumes to be migrated.
+Volumes must exist in pool to be eligible for migration. 
+
+
+With Regex optional:
+LowestUtil -- Identify the volume in the pool with the least data on it
+and empty it. No Migration Selection Pattern required. 
+
+OldestVol -- Identify the LRU volume with data written, and empty it. No
+Migration Selection Pattern required. 
+
+PoolOccupancy -- if pool occupancy exceeds <highmig>, migrate volumes
+(starting with most full volumes) until pool occupancy drops below
+<lowmig>. Pool highmig and lowmig values are in pool definition, no
+Migration Selection Pattern required.
+
+
+No regex:
+SQLQuery -- Migrate all jobuids returned by the supplied SQL query.
+Migration Selection Pattern contains SQL query to execute; should return
+a list of 1 or more jobuids to migrate.
+
+PoolResidence -- Migrate data sitting in pool for longer than
+PoolResidence value in pool definition. Migration Selection Pattern
+optional; if specified, override value in pool definition (value in
+minutes). 
+
+
+[ possibly a Python event -- kes ]
+===
+- run_cmd() returns int should return JobId_t
+- get_next_jobid_from_list() returns int should return JobId_t
+- Document export LDFLAGS=-L/usr/lib64
+- Don't attempt to restore from "Disabled" Volumes.
+- Network error on Win32 should set Win32 error code.
+- What happens when you rename a Disk Volume?
+- Job retention period in a Pool (and hence Volume).  The job would
+  then be migrated.
+- Detect resource deadlock in Migrate when same job wants to read
+  and write the same device.
+- Make hardlink code at line 240 of find_one.c use binary search.
+- Queue warning/error messages during restore so that they
+  are reported at the end of the report rather than being
+  hidden in the file listing ...
+- Look at -D_FORTIFY_SOURCE=2
+- Add Win32 FileSet definition somewhere
+- Look at fixing restore status stats in SD.
+- Make selection of Database used in restore correspond to
+  client.
+- Implement a mode that says when a hard read error is
+  encountered, read many times (as it currently does), and if the
+  block cannot be read, skip to the next block, and try again.  If
+  that fails, skip to the next file and try again, ...
+- Add level table:
+  create table LevelType (LevelType binary(1), LevelTypeLong tinyblob);
+  insert into LevelType (LevelType,LevelTypeLong) values
+  ("F","Full"),
+  ("D","Diff"),
+  ("I","Inc");
+- Add ACL to restore only to original location.
+- Add a recursive mark command (rmark) to restore.
+- "Minimum Job Interval = nnn" sets minimum interval between Jobs
+  of the same level and does not permit multiple simultaneous
+  running of that Job (i.e. lets any previous invocation finish
+  before doing Interval testing).
+- Look at simplifying File exclusions.
+- New directive "Delete purged Volumes"
+- new pool XXX with ScratchPoolId = MyScratchPool's PoolId and
+  let it fill itself, and RecyclePoolId = XXX's PoolId so I can
+  see if it become stable and I just have to supervise
+  MyScratchPool
+- If I want to remove this pool, I set RecyclePoolId = MyScratchPool's
+  PoolId, and when it is empty remove it.
+- Figure out how to recycle Scratch volumes back to the Scratch Pool.
+- Add Volume=SCRTCH
+- Allow Check Labels to be used with Bacula labels.
+- "Resuming" a failed backup (lost line for example) by using the
+  failed backup as a sort of "base" job.
+- Look at NDMP
+- Email to the user when the tape is about to need changing x
+  days before it needs changing.
+- Command to show next tape that will be used for a job even
+  if the job is not scheduled.
+- From: Arunav Mandal <amandal@trolltech.com>
+  1. When jobs are running and bacula for some reason crashes or if I do a 
+  restart it remembers and jobs it was running before it crashed or restarted 
+  as of now I loose all jobs if I restart it.
+
+  2. When spooling and in the midway if client is disconnected for instance a 
+  laptop bacula completely discard the spool. It will be nice if it can write 
+  that spool to tape so there will be some backups for that client if not all.
+
+  3. We have around 150 clients machines it will be nice to have a option to 
+  upgrade all the client machines bacula version automatically.
+
+  4. Atleast one connection should be reserved for the bconsole so at heavy load 
+  I should connect to the director via bconsole which at sometimes I can't
+
+  5. Another most important feature that is missing, say at 10am I manually 
+  started  backup of client abc and it was a full backup since client abc has 
+  no backup history and at 10.30am bacula again automatically started backup of 
+  client abc as that was in the schedule. So now we have 2 multiple Full 
+  backups of the same client and if we again try to start a full backup of 
+  client backup abc bacula won't complain. That should be fixed.
+
+- Fix bpipe.c so that it does not modify results pointer.
+  ***FIXME*** calling sequence should be changed.
+- For Windows disaster recovery see http://unattended.sf.net/
+- regardless of the retention period, Bacula will not prune the
+  last Full, Diff, or Inc File data until a month after the
+  retention period for the last Full backup that was done.
+- update volume=xxx --- add status=Full
+- Remove old spool files on startup.
+- Exclude SD spool/working directory.
 - Refuse to prune last valid Full backup. Same goes for Catalog.
-- --without-openssl breaks at least on Solaris.
 - Python:
   - Make a callback when Rerun failed levels is called.
   - Give Python program access to Scheduled jobs.
@@ -46,35 +191,7 @@ For 1.37:
   resources were locked. 
 - The last part is left in the spool dir.
 
-Document:
-- Port limiting -m in iptables to prevent DoS attacks
-  could cause broken pipes on Bacula.
-- Document that Bootstrap files can be written with cataloging
-  turned off.
-- Pruning with Admin job.
-- Add better documentation on how restores can be done
-- OS linux 2.4
-  1) ADIC, DLT, FastStor 4000, 7*20GB
-  2) Sun, DDS, (Suns name unknown - Archive Python DDS drive), 1.2GB
-  3) Wangtek, QIC, 6525ES, 525MB (fixed block size 1k, block size etc. 
-  driver dependent - aic7xxx works, ncr53c8xx with problems)
-  4) HP, DDS-2, C1553A, 6*4GB
-- Doc the following
-  to activate, check or disable the hardware compression feature on my 
-  exb-8900 i use the exabyte "MammothTool" you can get it here:
-  http://www.exabyte.com/support/online/downloads/index.cfm
-  There is a solaris version of this tool. With option -C 0 or 1 you can 
-  disable or activate compression. Start this tool without any options for 
-  a small reference.
-- Linux Sony LIB-D81, AIT-3 library works.
-- Document PostgreSQL performance problems bug 131.
-- Document testing
-- Document that ChangerDevice is used for Alert command.
-- Document new CDROM directory.
-- Document Heartbeat Interval in the dealing with firewalls section.
-- Document the multiple-drive-changer.txt script.
 
-Maybe in 1.37:
 - In restore don't compare byte count on a raw device -- directory
   entry does not contain bytes.
 - To mark files as deleted, run essentially a Verify to disk, and
@@ -129,6 +246,105 @@ Maybe in 1.37:
 - Bug: if a job is manually scheduled to run later, it does not appear
   in any status report and cannot be cancelled.
 
+==== Keeping track of deleted files ====
+     My "trick" for keeping track of deletions is the following.
+     Assuming the user turns on this option, after all the files
+     have been backed up, but before the job has terminated, the
+     FD will make a pass through all the files and send their
+     names to the DIR (*exactly* the same as what a Verify job
+     currently does).  This will probably be done at the same
+     time the files are being sent to the SD avoiding a second
+     pass.  The DIR will then compare that to what is stored in
+     the catalog.  Any files in the catalog but not in what the
+     FD sent will receive a catalog File entry that indicates
+     that at that point in time the file was deleted.
+
+     During a restore, any file initially picked up by some
+     backup (Full, ...) then subsequently having a File entry
+     marked "delete" will be removed from the tree, so will not
+     be restored.  If a file with the same name is later OK it
+     will be inserted in the tree -- this already happens.  All
+     will be consistent except for possible changes during the
+     running of the FD.
+
+     Since I'm on the subject, some of you may be wondering what
+     the utility of the in memory tree is if you are going to
+     restore everything (at least it comes up from time to time
+     on the list).  Well, it is still *very* useful because it
+     allows only the last item found for a particular filename
+     (full path) to be entered into the tree, and thus if a file
+     is backed up 10 times, only the last copy will be restored.
+     I recently (last Friday) restored a complete directory, and
+     the Full and all the Differential and Incremental backups
+     spanned 3 Volumes.  The first Volume was not even mounted
+     because all the files had been updated and hence backed up
+     since the Full backup was made.  In this case, the tree
+     saved me a *lot* of time.
+
+     Make sure this information is stored on the tape too so
+     that it can be restored directly from the tape.
+
+  Comments from Martin Simmons (I think they are all covered):
+  Ok, that should cover the basics.  There are few issues though:
+
+  - Restore will depend on the catalog.  I think it is better to include the
+  extra data in the backup as well, so it can be seen by bscan and bextract.
+
+  - I'm not sure if it will preserve multiple hard links to the same inode.  Or
+  maybe adding or removing links will cause the data to be dumped again?
+
+  - I'm not sure if it will handle renamed directories.  Possibly it will work
+  by dumping the whole tree under a renamed directory?
+
+  - It remains to be seen how the backup performance of the DIR's will be
+  affected when comparing the catalog for a large filesystem.
+
+==== 
+From David:
+How about introducing a Type = MgmtPolicy job type? That job type would
+be responsible for scanning the Bacula environment looking for specific
+conditions, and submitting the appropriate jobs for implementing said
+policy, eg: 
+
+Job {
+   Name = "Migration-Policy"
+   Type = MgmtPolicy
+   Policy Selection Job Type = Migrate
+   Scope = "<keyword> <operator> <regexp>"
+   Threshold = "<keyword> <operator> <regexp>"
+   Job Template = <template-name>
+}
+
+Where <keyword> is any legal job keyword, <operator> is a comparison
+operator (=,<,>,!=, logical operators AND/OR/NOT) and <regexp> is a
+appropriate regexp. I could see an argument for Scope and Threshold
+being SQL queries if we want to support full flexibility. The
+Migration-Policy job would then get scheduled as frequently as a site
+felt necessary (suggested default: every 15 minutes). 
+
+Example: 
+
+Job {
+   Name = "Migration-Policy"
+   Type = MgmtPolicy
+   Policy Selection Job Type = Migration
+   Scope = "Pool=*"
+   Threshold = "Migration Selection Type = LowestUtil"
+   Job Template = "MigrationTemplate"
+}
+
+would select all pools for examination and generate a job based on
+MigrationTemplate to automatically select the volume with the lowest
+usage and migrate it's contents to the nextpool defined for that pool. 
+
+This policy abstraction would be really handy for adjusting the behavior
+of Bacula according to site-selectable criteria (one thing that pops
+into mind is Amanda's ability to automatically adjust backup levels
+depending on various criteria).
+
+
+=====
+
 Regression tests:
 - Add Pool/Storage override regression test.
 - Add delete JobId to regression.
@@ -1146,155 +1362,45 @@ Block Position: 0
 
 
 === Done
-- Save mount point for directories not traversed with onefs=yes.
-- Add seconds to start and end times in the Job report output.
-- if 2 concurrent backups are attempted on the same tape
-  drive (autoloader) into different tape pools, one of them will exit
-  fatally instead of halting until the drive is idle
-- Update StartTime if job held in Job Queue.
-- Look at www.nu2.nu/pebuilder as a helper for full windows
-  bare metal restore. (done by Scott)
-- Fix orphanned buffers:
-   Orphaned buffer:      24 bytes allocated at line 808 of rufus-dir job.c
-   Orphaned buffer:      40 bytes allocated at line 45 of rufus-dir alist.c
-- Implement Preben's suggestion to add
-  File System Types = ext2, ext3 
-  to FileSets, thus simplifying backup of *all* local partitions.
-- Try to open a device on each Job if it was not opened
-  when the SD started.
-- Add dump of VolSessionId/Time and FileIndex with bls.
-- If Bacula does not find the right tape in the Autochanger,
-  then mark the tape in error and move on rather than asking
-  for operator intervention.
-- Cancel command should include JobId in list of Jobs.
-- Add performance testing hooks
-- Bootstrap from JobMedia records.
-- Implement WildFile and WildDir to solve problem of 
-  saving only *.doc files.
-- Fix
-   Please use the "label"  command to create a new Volume for:
-       Storage:      DDS-4-changer
-       Media type:   
-       Pool:         Default
-   label
-   The defined Storage resources are:
-- Copy Changer Device and Changer Command from Autochanger
-  to Device resource in SD if none given in Device resource.
-- 1. Automatic use of more than one drive in an autochanger (done)
-- 2. Automatic selection of the correct drive for each Job (i.e.
-     selects a drive with an appropriate Volume for the Job) (done)
-- 6. Allow multiple simultaneous Jobs referencing the same pool write
-    to several tapes (some new directive(s) are are probably needed for
-    this) (done)
-- Locking (done)
-- Key on Storage rather than Pool (done)
-- Allow multiple drives to use same Pool (change jobq.c DIR) (done).
-- Synchronize multiple drives so that not more
-  than one loads a tape and any time (done)
-- 4. Use Changer Device and Changer Command specified in the
-     Autochanger resource, if none is found in the Device resource.
-    You can continue to specify them in the Device resource if you want
-    or need them to be different for each device.
-- 5. Implement a new Device directive (perhaps "Autoselect = yes/no") 
-    that can allow a Device be part of an Autochanger, and hence the changer
-    script protected, but if set to no, will prevent the Device from being 
-    automatically selected from the changer. This allows the device to
-    be directly accessed through its Device name, but not through the
-    AutoChanger name.
-#6   Select one from among Multiple Storage Devices for Job
-#5   Events that call a Python program 
-     (Implemented in Dir/SD)
-- Make sure the Device name is in the Query packet returned.
-- Don't start a second file job if one is already running.
-- Implement EOF/EOV labels for ANSI labels
-- Implement IBM labels.
-- When Python creates a new label, the tape is immediately
-  recycled and no label created. This happens when using   
-  autolabeling -- even when Python doesn't generate the name.
-- Scratch Pool where the volumes can be re-assigned to any Pool.
-- 28-Mar 23:19 rufus-sd: acquire.c:379 Device "DDS-4" (/dev/nst0) 
-  is busy reading. Job 6 canceled.
-- Remove separate thread for opening devices in SD.  On the other
-  hand, don't block waiting for open() for devices.
-- Fix code to either handle updating NumVol or to calculate it in
-  Dir next_vol.c
-- Ensure that you cannot exclude a directory or a file explicitly
-  Included with File.
-#4   Embedded Python Scripting 
-     (Implemented in Dir/SD/FD)
-- Add Python writable variable for changing the Priority,
-    Client, Storage, JobStatus (error), ...
-- SD Python
-  - Solicit Events
-- Add disk seeking on restore; turn off seek on tapes.
-  stored/match_bsr.c
-- Look at dird_conf.c:1000: warning: `int size' 
-  might be used uninitialized in this function
-- Indicate when a Job is purged/pruned during restore.
-- Implement some way to turn off automatic pruning in Jobs.
-- Implement a way an Admin Job can prune, possibly multiple
-  clients -- Python script?
-- Look at Preben's acl.c error handling code.
-- SD crashes after a tape restore then doing a backup. 
-- If drive is opened read/write, close it and re-open
-  read-only if doing a restore, and vice-versa.
-- Windows restore:
-  data-fd: RestoreFiles.2004-12-07_15.56.42 Error:
-  > ..\findlib\../../findlib/create_file.c:275 Could not open e:/: ERR=Der
-  > Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen
-  > Prozess verwendet wird.
-  Restore restores all files, but then fails at the end trying
-  to set the attributes of e:
-  from failed jobs.- Resolve the problem between Device name and Archive name,
-  and fix SD messages.
-- Tell the "restore" user when browsing is no longer possible.
-- Add a restore directory-x
-- Write non-optimized bsrs from the JobMedia and Media records,
-  even after Files are pruned.
-- Delete Stripe and Copy from VolParams to save space.
-- Fix option 2 of restore -- list where file is backed up -- require Client,
-  then list last 20 backups.
-- Finish implementation of passing all Storage and Device needs to
-  the SD.
-- Move test for max wait time exceeded in job.c up -- Peter's idea.
-##   Consider moving docs to their own project.
-##   Move rescue to its own project.
-- Add client version to the Client name line that prints in
-  the Job report.
-- Fix the Rescue CDROM.
-- By the way: on page http://www.bacula.org/?page=tapedrives , at the 
-  bottom, the link to "Tape Testing Chapter" is broken. It goes to 
-  /html-manual/... while the others point to /rel-manual/...
-- Device resource needs the "name" of the SD.
-- Specify a single directory to restore.
-- Implement MediaType keyword in bsr?   
-- Add a date and time stamp at the beginning of every line in the 
-  Job report (Volker Sauer).
-- Add level to estimate command.
-- Add "limit=n" for "list jobs"
-- Make bootstrap filename unique.
-- Make Dmsg look at global before calling subroutine.
-- From Chris Hull:
-   it seems to be complaining about 12:00pm which should be a valid 12
-   hour time.  I changed the time to 11:59am and everything works fine.
-   Also 12:00am works fine.  0:00pm also works (which I don't think
-   should).  None of the values 12:00pm - 12:59pm work for that matter.
-- Require restore via the restore command or make a restore Job
-  get the bootstrap file.
-- Implement Maximum Job Spool Size
-- Fix 3993 error in SD. It forgets to look at autochanger
-  resource for device command, ...
-- 3. Prevent two drives requesting the same Volume in any given
-     autochanger, by checking if a Volume is mounted on another drive
-     in an Autochanger.
-- Upgrade to MySQL 4.1.12 See:  
-  http://dev.mysql.com/doc/mysql/en/Server_SQL_mode.html
-- Add # Job Level date to bsr file
-- Implement "PreferMountedVolumes = yes|no" in Job resource.
-##   Integrate web-bacula into a new Bacula project with
-     bimagemgr.
-- Cleaning tapes should have Status "Cleaning" rather than append.
-- Make sure that Python has access to Client address/port so that
-  it can check if Clients are alive.
-- Review all items in "restore".
-
+- Make sure that all do_prompt() calls in Dir check for
+  -1 (error) and -2 (cancel) returns.
+- Fix foreach_jcr() to have free_jcr() inside next().
+  jcr=jcr_walk_start();
+  for ( ; jcr; (jcr=jcr_walk_next(jcr)) )
+  ...
+  jcr_walk_end(jcr);
+- A Volume taken from Scratch should take on the retention period
+  of the new pool.
+- Correct doc for Maximum Changer Wait (and others) accepting only
+  integers.
+- Implement status that shows why a job is being held in reserve, or
+  rather why none of the drives are suitable.
+- Implement a way to disable a drive (so you can use the second
+  drive of an autochanger, and the first one will not be used or
+  even defined).
+- Make sure Maximum Volumes is respected in Pools when adding
+  Volumes (e.g. when pulling a Scratch volume).
+- Keep same dcr when switching device ...
+- Implement code that makes the Dir aware that a drive is an             
+  autochanger (so the user doesn't need to use the Autochanger = yes 
+  directive).
+- Make catalog respect ACL.
+- Add recycle count to Media record.
+- Add initial write date to Media record.
+- Fix store_yesno to be store_bitmask.
+--- create_file.c.orig  Fri Jul  8 12:13:05 2005
++++ create_file.c       Fri Jul  8 12:13:07 2005
+@@ -195,6 +195,8 @@
+                     attr->ofname, be.strerror());
+               return CF_ERROR;
+            }
++        } else if(S_ISSOCK(attr->statp.st_mode)) {
++            Dmsg1(200, "Skipping socket: %s\n", attr->ofname);
+         } else {          
+             Dmsg1(200, "Restore node: %s\n", attr->ofname);
+            if (mknod(attr->ofname, attr->statp.st_mode, attr->statp.st_rdev) != 0 && errno != EEXIST) {
+- Add true/false to conf same as yes/no
+- Reserve blocks other restore jobs when first cannot connect to SD.
+- Fix Maximum Changer Wait, Maximum Open Wait, Maximum Rewind Wait to 
+  accept time qualifiers.
+- Does ClientRunAfterJob fail the job on a bad return code?