Kern's ToDo List
- 15 August 2003
+ 30 September 2003
Documentation to do: (any release a little bit at a time)
- Document running a test version.
hours of operation.
- Lookup HP cleaning recommendations.
- Lookup HP tape replacement recommendations (see trouble shooting autochanger)
-- Document FInclude ...
-- Document all the status codes JobLevel, JobType, JobStatus.
Testing to do: (painful)
- that ALL console command line options work and are always implemented
- Test if rewind at end of tape waits for tape to rewind.
- Test cancel at EOM.
- Test not zeroing Autochanger slot when it is wrong.
-- Test recycling and purging (code changed in db_find_next_volume and
- in recycle.c).
- Figure out how to use ssh or stunnel to protect Bacula communications.
-For 1.32:
+For 1.33 Testing/Documentation:
+- Document to start higher priorty jobs before lower ones.
+- suppress "Do not forget to mount the drive!!!" if error
+- Document new records in Director. SDAddress SDDeviceName, SDPassword.
+ FDPassword, FDAddress, DBAddress, DBPort, DBPassword.
+- Document new Include/Exclude ...
+- Add test of exclusion, test multiple Include {} statements.
+- Add counter variable test.
+- Document ln -sf /usr/lib/libncurses.so /usr/lib/libtermcap.so
+ and install the esound-dev package for compiling Console on
+ SuSE.
+
+For 1.33
+- Implement a RunAfterFailedJob
+- Zap illegal characters in job name for mail files (e.g. /).
+- From Lars Köllers:
+ Yes, it would allow to highly automatic the request for new tapes. If a
+ tape is empty, bacula reads the barcodes (native or simulated), and if
+ an unused tape is found, it runs the label command with all the
+ necessary parameters.
+
+ By the way can bacula automatically "move" an empry/purged volume say
+ in the "short" pool to the "long" pool if this pool runs out of volume
+ space?
+- Implement a move Volume from one pool to another.
+- Either restrict the characters in a name, or fix the problem
+ emailing with names containing / (smtp command line breaks).
+- Eliminate ua_retention.c (retentioncmd) if possible.
+- Eliminate orphaned jobs: dbcheck, normal pruning, delete job command.
+ Hm. Well, there are the remaining orphaned job records:
+
+ | 105 | Llioness Save | 0000-00-00 00:00:00 | B | D | 0 | 0 | f |
+ | 110 | Llioness Save | 0000-00-00 00:00:00 | B | I | 0 | 0 | f |
+ | 115 | Llioness Save | 2003-09-10 02:22:03 | B | I | 0 | 0 | A |
+ | 128 | Catalog Save | 2003-09-11 03:53:32 | B | I | 0 | 0 | C |
+ | 131 | Catalog Save | 0000-00-00 00:00:00 | B | I | 0 | 0 | f |
+
+ As you can see, three of the five are failures. I already deleted the
+ one restore and one other failure using the by-client option. Deciding
+ what is an orphaned job is a tricky problem though, I agree. All these
+ records have or had 0 files/ 0 bytes, except for the restore. With no
+ files, of course, I don't know of the job ever actually becomes
+ associated with a Volume.
+
+ (I'm not sure if this is documented anywhere -- what are the meanings of
+ all the possible JobStatus codes?)
+
+ Looking at my database, it appears to me as though all the "orphaned"
+ jobs fit into one of two categories:
+
+ 1) The Job record has a StartTime but no EndTime, and the job is not
+ currently running;
+ or
+ 2) The Job record has an EndTime, indicating that it completed, but
+ it has no associated JobMedia record.
+
+
+ This does suggest an approach. If failed jobs (or jobs that, for some
+ other reason, write no files) are associated with a volume via a
+ JobMedia record, then they should be purged when the associated volume
+ is purged. I see two ways to handle jobs that are NOT associated with a
+ specific volume:
+
+ 1) purge them automatically whenever any volume is manually purged;
+ or
+ 2) add an option to the purge command to manually purge all jobs with
+ no associated volume.
+
+ I think Restore jobs also fall into category 2 above .... so one might
+ want to make that "The Job record has an EndTime,, but no associated
+ JobMedia record, and is not a Restore job."
+- Implement RestoreJobRetention? Maybe better "JobRetention" in a Job,
+ which would take precidence over the Catalog "JobRetention".
+- Implement Label Format in Add and Label console commands.
+- make "btape /tmp" work.
+- Make sure a rescheduled job is properly reported by status.
+- Walk through the Pool records rather than the Job records
+ in dird.c to create/update pools.
+- Figure out a way to move Volumes from one pool to another.
+- What to do about "list files job=xxx".
+- Implement delete Job.
+- Document need to put LabelFormat in quotes.
+- Implement scan: for every slot it finds, zero the slot of
+ Volume other volume having that slot.
+- When job rescheduled, status gives is waiting for Client Rufus
+ to connect to Storage File. Dir needs to inform SD that job
+ is rescheduled.
+- Fix get_storage_from_media_type (ua_restore) to use command line
+ storage=
+- Enhance "update slots" to include a "scan" feature
+ scan 1; scan 1-5; scan 1,2,4 ... to update the catalog
+- Allow a slot or range of slots on the label barcodes command.
+- Don't print "Warning: Wrong Volume mounted ..." if mounting second volume.
+- Make Dmsg look at global before calling subroutine.
+- Enable trace output at runtime for Win32
+- Make sure that Volumes are recycled based on "Least recently used"
+ rather than lowest MediaId.
+- Available volumes for autochangers (see patrick@baanboard.com 3 Sep 03
+ and 4 Sep) scan slots.
+- Upgrade to cygwin 1.5
+- Get MySQL 3.23.58
+- Get and test MySQL 4.0
+- Do a complete audit of all pthreads_mutex, cond, ... to ensure that
+ any that are dynamically initialized are destroyed when no longer used.
+- Write a mini-readline with history and editing.
+- Look at how fuser works and /proc/PID/fd that is how Nic found the
+ file descriptor leak in Bacula.
+- Implement WrapCounters in Counters.
+- Turn on SIGHUP in dird.c and test.
+- Use system dependent calls to get more precise info on tape errors.
+- Add heartbeat from FD to SD if hb interval expires.
+- Suppress read error on blank tape when doing a label.
+- Can we dynamically change FileSets?
+- If pool specified to label command and Label Format is specified,
+ automatically generate the Volume name.
+- Take a careful look a the Basic recycling algorithm. When Bacula
+ chooses, the order should be:
+ - Look for Append
+ - Look for Recycle or Purged
+ - Prune volumes
+ - Look for purged
+ Instead of using lowest media Id, find the least recently used
+ volume.
+
+ When the tape is mounted and Bacula requests the status
+ - Do everything possible to use it.
+
+ Define a "available" status, which is the currently mounted
+ Volume and all volumes that are currently in the autochanger.
+
+- Why can't SQL do the filename sort for restore?
+- Is a pool specification really needed for a restore? Yes, and
+ you may want to exclude archive Pools.
+- Look at libkse (man kse) for FreeBSD threading.
+- Look into Microsoft Volume Shadowcopy Service VSS for backing
+ up system state components (Active Directory, System Volume, ...)
+- Add ExhautiveRestoreSearch
+- Look at the possibility of loading only the necessary
+ data into the restore tree (i.e. do it one directory at a
+ time as the user walks through the tree).
+- Possibly use the hash code if the user selects all for a restore command.
+- Orphaned Dir buffer at parse_conf.c:373 => store_dir
+- Implement some way for the File daemon to contact the Director
+ to start a job or pass its DHCP obtained IP number.
+- Implement multiple Consoles.
+- Add Console usr permissions.
+- Fix "restore all" to bypass building the tree.
+- Fix restore to list errors if Invalid block found, and if # files
+ restored does not match # expected.
+- Prohibit backing up archive device (findlib/find_one.c:128)
+- Implement Release Device in the Job resource to unmount a drive.
+- Implement Acquire Device in the Job resource to mount a drive,
+ be sure this works with admin jobs so that the user can get
+ prompted to insert the correct tape. Possibly some way to say to
+ run the job but don't save the files.
+- Add JobName= to VerifyToCatalog so that all verifies can be done at the end.
+- Implement FileOptions (see end of this document)
+- Make things like list where a file is saved case independent for
+ Windows.
- Edit the Client/Storage name into authentication failure messages.
- Implement job in VerifyToCatalog
- Implement migrate
-- Implement List Volume Job=xxx or List scheduled volumes or Status Director
-- Allow a slot or range of slots on the label barcodes command.
- Implement a PostgreSQL driver.
-- Is a pool specification really needed for a restore?
- Bacula needs to propagate SD errors.
> > cluster-dir: Start Backup JobId 252, Job=REUTERS.2003-08-11_15.04.12
> > prod4-sd: REUTERS.2003-08-11_15.04.12 Error: Write error on device
> > prod4-sd: End of medium on Volume "REU007" Bytes=16,303,521,933
- Use autochanger to handle multiple devices.
-- Fix packet too big problem.
+- Fix packet too big problem. This is most likely a Windows TCP stack
+ problem.
- Add SuSE install doc to list.
- Check and rechedk "Invalid block number"
- Make bextract release the drive properly between tapes
so that an autochanger can be made to work.
-- Fix "restore all" to bypass building the tree.
-- Fix restore to list errors if Invalid block found, and if # files
- restored does not match # expected.
- User wants to NOT backup up certain big files (email files).
- Maybe remove multiple simultaneous devices code in SD.
- On Windows with very long path names, it may be impossible to create
- Something is not right in last block of fill command.
- Implement a Recycle command
- Add FileSet to command line arguments for restore.
-- Do full check the command line args in update (e.g. VolStatus ...).
- Allow multiple Storage specifications (or multiple names on
a single Storage specification) in the Job record. Thus a job
can be backed up to a number of storage devices.
- Audit all UA commands to ensure that we always prompt where possible.
- Restrict characters permitted in a Resource name, and don't permit
duplicate names.
-- Prohibit backing up archive device (findlib/find_one.c:128)
-- Make | and < work on FD side.
- Check Jmsg in bnet, may not work, must dup bsock.
- Suppress Job Name in Jmsg for console
- Create Pools that are referenced in a Run statement at startup if possible.
if there is an error.
- Make sure all restore counters are working correctly in the FD.
- SD Bytes Read is wrong.
-- Configure mtx-changer to have correct path to mtx.
- Look at ALL higher level routines that call block.c to be sure
they don't expect something in errmsg.
- Investigate doing RAW backup of Win32 partition.
-- Add JobName= to VerifyToCatalog so that all verifies can be done at the end.
- Add thread specific data to hold the jcr -- send error messages from
low level routines by accessing it and using Jmsg().
- Cancel waiting for Client connect in SD if FD goes away.
- Add Progress command that periodically reports the progress of
a job or all jobs.
- One block was orphaned in the SD probably after cancel.
-- Add all command line arguments to "update", e.g. slot=nn volStatus=append, ...
- Examine Bare Metal restore problem (a FD crash exists somewhere ...).
- Implement single pane restore (much like the Gftp panes).
- Implement Automatic Mount even in operator wait.
- Implement create "FileSet"?
-- Implement Release Device in the Job resource to unmount a drive.
-- Implement Acquire Device in the Job resource to mount a drive,
- be sure this works with admin jobs so that the user can get
- prompted to insert the correct tape. Possibly some way to say to
- run the job but don't save the files.
-- Implement all command line args on run.
-- Implement command line "restore" args.
-- Implement "restore current select=no"
- Fix watchdog pthread crash on Win32 (this is pthread_kill() Cygwin bug)
- Implement "scratch pool" where tapes are defined and can be
taken by any pool that needs them.
to the user, who would then use "mount" as described above
once he had actually inserted the disk.
-- Make some way so that if a machine is skipped because it is not up
- that Bacula will continue retrying for a specified period of time --
- periodically.
- If tape is marked read-only, then try opening it read-only rather than
failing, and remember that it cannot be written.
- Refine SD waiting output:
- Figure out some way to estimate output size and to avoid splitting
a backup across two Volumes -- this could be useful for writing CDROMs
where you really prefer not to have it split -- not serious.
-- Add RunBeforeJob and RunAfterJob to the Client program.
- Have SD compute MD5 or SHA1 and compare to what FD computes.
- Make VolumeToCatalog calculate an MD5 or SHA1 from the
actual data on the Volume and compare it.
-- Implement FileOptions (see end of this document)
- Implement Bacula plugins -- design API
- Make bcopy read through bad tape records.
-- Fix read_record to handle multiple sessions.
- Program files (i.e. execute a program to read/write files).
Pass read date of last backup, size of file last time.
- Add Signature type to File DB record.
- Check if we can increase Bacula FD priorty in Win2000
- Make sure the MaxVolFiles is fully implemented in SD
- Check if both CatalogFiles and UseCatalog are set to SD.
-- Need return status on read_cb() from read_records(). Need multiple
- records -- one per Job, maybe a JCR or some other structure with
- a block and a record.
- Figure out how to do a bare metal Windows restore
- Possibly add email to Watchdog if drive is unmounted too
long and a job is waiting on the drive.
-- Use read_record.c in SD code.
-- Restore program that errors in SD due to no tape reports
+- Restore program that errs in SD due to no tape, reports
OK incorrectly in output.
- After unmount, if restore job started, ask to mount.
- Convert all %x substitution variables, which are hard to remember
- Compare tape to Client files (attributes, or attributes and data)
- Make all database Ids 64 bit.
- Write an applet for Linux.
-- Add estimate to Console commands
-- Implement new daemon communications protocol.
+- Implement new inter-daemon communications protocol.
- Allow console commands to detach or run in background.
- Fix status delay on storage daemon during rewind.
- Add SD message variables to control operator wait time
- Set flag for uname -a. Add to Volume label.
- Implement throttled work queue.
- Restore files modified after date
-- Restore file modified before date
-- Restore -- do nothing but show what would happen
- SET LD_RUN_PATH=$HOME/mysql/lib/mysql
- Implement Restore FileSet=
- Create a protocol.h and protocol.c where all protocol messages
> woorkstations to be shut down overnight to save power.
>
+- From Terry Manderson <terry@apnic.net>
+ jobset { # new structure
+ name = "monthlyUnixBoxen"
+ type = backup
+ level = full
+ jobs = "wakame;durian;soy;wasabi;miso" #new!
+ schedule = monthly
+ storage = DLT
+ messages = Standard
+ pool = MonthlyPool
+ priority = 10
+ }
+
+ job {
+ name = "wakame"
+ fileset = "genericUnixSet"
+ client = wakame-fd
+ }
+
+ job {
+ name = "durian"
+ fileset = "genericUnixSet"
+ client = durian-fd
+ }
+
+ job {
+ name = "soy"
+ fileset = "UnixDevelBoxSet"
+ client = soy-fd
+ }
+
+
- Autolabel should be specified by DIR instead of SD.
- Storage daemon
- Add media capacity
Proposed Implementation:
To solve this problem, I propose the following:
- - Add a new Director resource type called FileOptions.
+ - Add a new Director resource type called Options.
- - The FileOptions resource will have records for all
+ - The Options resource will have records for all
options that can currently be specified on the Include record
(in a FileSet). Examples below.
- - The FileOptions resource will permit an exclude option as well
+ - The Options resource will permit an exclude option as well
as a number of additional options.
- - The heart of the FileOptions resource is the ability to
- supply any number of ApplyTo records which specify POSIX
- regular expressions. These ApplyTo regular expressions are
+ - The heart of the Options resource is the ability to
+ supply any number of Match records which specify POSIX
+ regular expressions. These Match regular expressions are
applied to the fully qualified filename (path and all). If
- one matches, then the FileOptions will be used.
+ one matches, then the Options will be used.
- - When an ApplyTo specification matches an included file, the
- options specified in the FileOptions resource will override
+ - When an Match specification matches an included file, the
+ options specified in the Options resource will override
the default options specified on the Include record.
- Include records will be modified to permit referencing one or
- more FileOptions resources. The FileOptions will be used
+ more Options resources. The Options will be used
in the order listed on the Include record and the first
one that matches will be applied.
year or so from now).
- The Exclude record will be deprecated as the same functionality
- can be obtained by using an Exclude = yes in the FileOptions.
+ can be obtained by using an Exclude = yes in the Options.
-FileOptions records:
- The following records can appear in the FileOptions resource. An
+Options records:
+ The following records can appear in the Options resource. An
asterisk preceding the name indicates a feature not currently
implemented.
For Restore Jobs:
- replace= (always/ifnewer/ifolder/never) - replace options currently
- implemented in 1.27
+ implemented in 1.31
- *Writer= (filename) - external write (restore) program
Implementation:
Currently options specifying compression, MD5 signatures, recursion,
... of a FileSet are supplied on the Include record. These will now
- all be collected into a FileOptions resource, which will be
- specified on the Include in place of the options. Multiple FileOptions
- may be specified. Since the FileOptions contain regular expressions
+ all be collected into a Options resource, which will be
+ specified in the Include in place of the options. Multiple Options
+ may be specified. Since the Options may contain regular expressions
that are applied to the full filename, this will give the ability
to specify backup options on a file by file basis to whatever level
of detail you wish.
FileSet {
Name = "FullSet"
- FInclude {
+ Include {
Compression = GZIP;
Signature = MD5
Match = /*.?*/ # matches all files.
That's a lot more to do the same thing, but it gives the ability to
apply options on a file by file basis. For example, suppose you
want to compress all files but not any file with extensions .gz or .Z.
- You could do so as follows:
+ In that case, you will need to group two sets of options using
+ the Options resource as follows:
+
FileSet {
Name = "FullSet"
- FInclude {
- FileOptions {
+ Include {
+ Options {
Signature = MD5
# Note multiple Matches are ORed
Match = /*.gz/ # matches .gz files */
Match = /*.Z/ # matches .Z files */
}
- FileOptions {
+ Options {
Compression = GZIP
Signature = MD5
Match = /*.?*/ # matches all files
}
}
- Now, since the NoCompress FileOptions is specified first on the
- Include line, any *.gz or *.Z file will have an MD5 signature computed,
- but will not be compressed. For all other files, the NoCompress will not
- match, so the Opts options will be used which will include GZIP
+ Now, since the no Compression option is specified in the
+ first group of Options, *.gz or *.Z file will have an MD5 signature computed,
+ but will not be compressed. For all other files, the *.gz *.Z will not
+ match, so the second group of options will be used which will include GZIP
compression.
Questions:
- Add user configurable timeout for connecting to SD.
- Unsaved Flag in Job record (use JobMissingFiles).
- Base Flag in Job record.
+- Configure mtx-changer to have correct path to mtx.
+- Add all command line arguments to "update", e.g. slot=nn volStatus=append, ...
+- Make some way so that if a machine is skipped because it is not up
+ that Bacula will continue retrying for a specified period of time --
+ periodically.
+- Implement all command line args on run.
+- Implement command line "restore" args.
+- Implement "restore current select=no"
+- Restore file modified before date
+- Restore -- do nothing but show what would happen
+- Add estimate to Console commands
+- Use read_record.c in SD code.
+- Fix read_record to handle multiple sessions.
+- Tip from Steve Allam
+ mt -f /dev/nst0 defblksize 0
+- Document "status" in the console.
+- Document driving console from shell script.
+- Write JobMedia records with max file size is reached on tape.
+- Handle the case of multiple JobMedia records pending (i.e. the
+ thread is slow and multiple situations requiring a JobMedia
+ record occur).
+- Do performance analysis on the restore tree routines.
+- Fix maximum file size (block.c) to generate JobMedia records.
+- Make the default file size 1GB on the tape.
+- Implement forward spacing between files.
+- Add Machine type (Linux/Windows) to Status report for daemons.
+ Look at src/host.h
+- Use repositioning at the beginning of the tape.
+- Do full check the command line args in update (e.g. VolStatus ...).
+- Specify list of files to restore
+- Implement ClientRunBeforeJob and ClientRunAfterJob.
+- Make | and < work on FD side.
+- Check to see if "blocked" is set during restore.
+- Figure out what is interrupting sql command in console.
+- Make new job print warning User Unmounted Tape.
+- Test recycling and purging (code changed in db_find_next_volume and
+ in recycle.c).
+- Document SDConnectTimeout (in FD).
+- Add restore by filename test.
+- Document restore by files.
+- Make variable expansion work correctly.
+- Implement List Volume Job=xxx or List scheduled volumes or Status Director
+- Copy static programs into install directory.
+- Think about changing Storage resource Device record to be
+ SDDeviceName.
+- Add RunBeforeJob and RunAfterJob to the Client program.
+- Need return status on read_cb() from read_records(). Need multiple
+ records -- one per Job, maybe a JCR or some other structure with
+ a block and a record.
+- LabelFormat on tape volume apparently creates the db record but
+ never actually labels the volume.
+- Recycling a volume when two jobs are using it is going to break. Fixed.
+- Document list nextvol and new format status dir.
+- Client files in Win32 with Unix eol conventions doesn't work.
+- Either fix or document that fill command in btape can be
+ compressed enormously by the hardware - a 36GB tape wrote 750GB!
+- Add multiple character duration qualifiers.
+- Require some modifer.
+- Restrict characters permitted in a Resource name, and don't permit
+ duplicate names.
+- Figure out some way to ignore or get past checksum errors in
+ reading.
+- The SD spooling file gets created even if it is not used.
+- Look at Cleaning tape in ua_label.c for media create/update
+- Add regression testing to the manual
+- End time: in job output of rescheduled job is time of first run.
+- Document list nextvol and status output.
+- Separate Dir heartbeat in FD from the SD heartbeat.
+- Fix sparse file handeling so that it always reads a multiple
+ of 512. Currently, it subtracts 8 bytes (for faddr).
+ Kludged with #ifdef for FreeBSD.
+- Document that Volume pruning can delete last Full backup and
+ hence you will not have a valid backup.
+- Clarify the fact that having the Bacula cygwin1.dll loaded
+ is not the same as having cygwin installed.
+- Document that it is safe to use the drive when the lights stop flashing.
+- Document all the status codes JobLevel, JobType, JobStatus.
+- Add GUI interface to manual
+- Combine the 3 places that search run records for the next
+ job. Use find_job_pool() modified in ua_output.c
+- Test connect timeouts.
+- Fix FreeBSD build with tcp_wrapper -- should not have -lnsl
+