Kern's ToDo List
- 20 May 2003
+ 6 July 2003
Documentation to do: (any release a little bit at a time)
- Document running a test version.
hours of operation.
- Lookup HP cleaning recommendations.
- Lookup HP tape replacement recommendations (see trouble shooting autochanger)
+- Document FInclude ...
+- Document need to add "-u root" to most of MySQL script calls
+ (./create_mys... ./make_my...).
+- Document c:/working directory better than /working directory.
+- Document all the status codes JobLevel, JobType, JobStatus.
+- Document update volume: jobid, current, before, all
+- Document run "yes".
+- Document that bscan does not work with multiple simultaneous jobs.
+- Update Automatic Volume Labeling in disk.wml
Testing to do: (painful)
- that ALL console command line options work and are always implemented
- blocksize recognition code.
-- multiple simultaneous Volumes
- Test if rewind at end of tape waits for tape to rewind.
- Test cancel at EOM.
-
+- Test not zeroing Autochanger slot when it is wrong.
+- Test of last block is correct in JobMedia when splitting file
+ over two volumes.
+- Test recycling and purging (code changed in db_find_next_volume and
+ in recycle.c).
- Figure out how to use ssh or stunnel to protect Bacula communications.
For 1.31 release:
+- In Win portable restore the directory is not create
+ 27-Jun-2003 16:52 tibs-fd: kernsrestore.2003-06-27_16.52.20 Error:
+ create_file.c:175 Could not create
+ /tmp/bacula-restores/cygwin/home/kern/bacula/k/src/dird/dird_conf.o: 0
+ ERR=The system cannot find the path specified.
+- Finish Windows implementation (add setting of correct type on restore,
+ add Portable Data Format flag).
+- Maybe remove multiple simultaneous devices code in SD.
+- Increment DB version prior to releasing.
+- Turn off FULL_DEBUG prior to releasing.
+- On Windows with very long path names, it may be impossible to create
+ a file (and thus restore it) because the total length is too long.
+ We must cd into the directory then create the file without the
+ full path name.
+- Move JobFiles and JobBytes to SD rather than FD -- more correct.
+- lstat() is not going to work on Win32 for testing date.
+- Implement a Recycle command
+- Something is not right in last block of fill command.
+- Implement List Volume Job=xxx or List scheduled volumes or Status Director
+- Check if Incremental is working correctly when it looks for the previous Job
+ (Phil's problem).
+
+
+For 1.32:
+- Add client name to cram-md5 challenge so Director can immediately
+ verify if it is the correct client.
+- Implement ClientRunBeforeJob and ClientRunAfterJob.
+- Implement new alist in FileSet scanning.
+- Add JobLevel in FD status (but make sure it is defined).
+- Audit all UA commands to ensure that we always prompt where possible.
+- Restrict characters permitted in a Resource name, and don't permit
+ duplicate names.
+- Prohibit backing up archive device (findlib/find_one.c:128)
+- Make | and < work on FD side.
+- Check Jmsg in bnet, may not work, must dup bsock.
+- Suppress Job Name in Jmsg for console
+- Create Pools that are referenced in a Run statement at startup if possible.
+- Use runbeforejob to unload, then reload a volume previously used,
+ then the next job run gets an error reading the drive.
+- Make bootstrap filename unique.
+- Test a second language e.g. french.
+- Start working on Base jobs.
+- Make "make binary-release" work from any directory.
+- Unsaved Flag in Job record (use JobMissingFiles).
+- Base Flag in Job record.
+- Implement UnsavedFiles DB record.
+- Implement argc/argv for daemon command line scanning using table driven
+ stuff below.
+- Implement table driven single argc/argv scanner to pickup all arguments.
+ Much like xxx_conf.c scan table.
+ keyword, handler(store_routine), store_address, code, flags, default.
+- From Phil Stracchino:
+ It would probably be a per-client option, and would be called
+ something like, say, "Automatically purge obsoleted jobs". What it
+ would do is, when you successfully complete a Differential backup of a
+ client, it would automatically purge all Incremental backups for that
+ client that are rendered redundant by that Differential. Likewise,
+ when a Full backup on a client completed, it would automatically purge
+ all Differential and Incremental jobs obsoleted by that Full backup.
+ This would let people minimize the number of tapes they're keeping on
+ hand without having to master the art of retention times.
+- Implement new serialize subroutines
+ send(socket, "string", &Vol, "uint32", &i, NULL)
+- Scratch Pool where the volumes can be re-assigned to any Pool.
+- Implement a M_SECURITY message class.
+- Implement forward spacing block/file: position_device(bsr) --
+ just before read_block_from_device();
+- When doing a Backup send all attributes back to the Director, who
+ would then figure out what files have been deleted.
+- Currently in mount.c:236 the SD simply creates a Volume. It should have
+ explicit permission to do so. It should also mark the tape in error
+ if there is an error.
+- Make sure all restore counters are working correctly in the FD.
+- SD Bytes Read is wrong.
+- Configure mtx-changer to have correct path to mtx.
+- Look at ALL higher level routines that call block.c to be sure
+ they don't expect something in errmsg.
- Investigate doing RAW backup of Win32 partition.
- Add JobName= to VerifyToCatalog so that all verifies can be done at the end.
- Add thread specific data to hold the jcr -- send error messages from
low level routines by accessing it and using Jmsg().
-- Default duration with no qualifier is sec should be 1 day
-- Find a solution for the multiple FileSet problem (when it is changed). Add date?
- Cancel waiting for Client connect in SD if FD goes away.
- Testing Tibs job erred and hung director on Storage resource. This was
because there were a whole pile of jobs hanging around in the SD
waiting for a connection from the FD that was never coming.
-- Make restore more robust in counting error and not immediately bailing
- out. Also print error message once, but try to continue.
-- Make SD keep track of Files, Bytes during restore.
-- Add code to check that blocks are sequential on restore.
-- File the Automatically selected: xxx
- to say Automatically selected Pool: xxx
-- Should Bacula make an Append tape as Purged when purging?
-- Shell expansion fails for working_directory in SD from time to time.
- Possibly update all client records at startup.
-- Implement MTIOCERRSTAT on FreeBSD to clear tape error conditions.
-
- Add Progress command that periodically reports the progress of
a job or all jobs.
-- Implement "Reschedule OnError=yes interval=nnn times=xxx"
- One block was orphaned in the SD probably after cancel.
- Add all command line arguments to "update", e.g. slot=nn volStatus=append, ...
-- Implement argc/argv for daemon command line scanning using table driven
- stuff below.
-- Implement table driven single argc/argv scanner to pickup all arguments.
- Much like xxx_conf.c scan table.
- keyword, handler(store_routine), store_address, code, flags, default.
- Examine Bare Metal restore problem (a FD crash exists somewhere ...).
-- Test multiple simultaneous Volumes
-- Document FInclude ...
- Implement timeout in response() when it should come quickly.
- Implement console @echo command.
- Implement a Slot priority (loaded/not loaded).
- Implement restore "current system", but take all files without
doing selection tree -- so that jobs without File records can
be restored.
-- Make | and < work on FD side.
-- Pass prefix_links to FD.
-- Implement a M_SECURITY message class.
- Implement disk spooling. Two parts: 1. Spool to disk then
immediately to tape to speed up tape operations. 2. Spool to
disk only when the tape is full, then when a tape is hung move
it to tape.
-- From Phil Stracchino:
- It would probably be a per-client option, and would be called
- something like, say, "Automatically purge obsoleted jobs". What it
- would do is, when you successfully complete a Differential backup of a
- client, it would automatically purge all Incremental backups for that
- client that are rendered redundant by that Differential. Likewise,
- when a Full backup on a client completed, it would automatically purge
- all Differential and Incremental jobs obsoleted by that Full backup.
- This would let people minimize the number of tapes they're keeping on
- hand without having to master the art of retention times.
- Implement a relocatable bacula.spec
- Allow multiple Storage specifications (or multiple names on
a single Storage specification) in the Job record. Thus a job
can be backed up to a number of storage devices.
- Implement dump/print label to UA
- Add prefixlinks to where or not where absolute links to FD.
-- Look at Python for a Bacula scripting language -- www.python.org
- Issue message to mount a new tape before the rewind.
- Simplified client job initiation for portables.
- If SD cannot open a drive, make it periodically retry.
- Program files (i.e. execute a program to read/write files).
Pass read date of last backup, size of file last time.
- Add Signature type to File DB record.
-- Make Restore report an error if FD or SD term codes are not OK.
- CD into subdirectory when open()ing files for backup to
speed up things. Test with testfind().
- Priority job to go to top of list.
-- Find out why Full saves run slower and slower (hashing?)
- Why are save/restore of device different sizes (sparse?) Yup! Fix it.
- Implement some way for the Console to dynamically create a job.
- Restore to a particular time -- e.g. before date, after date.
- Solaris -I on tar for include list
-- Prohibit backing up archive device (findlib/find_one.c:128)
- Need a verbose mode in restore, perhaps to bsr.
- bscan without -v is too quiet -- perhaps show jobs.
- Add code to reject whole blocks if not wanted on restore.
-- Start working on Base jobs.
- Check if we can increase Bacula FD priorty in Win2000
- Make sure the MaxVolFiles is fully implemented in SD
- Check if both CatalogFiles and UseCatalog are set to SD.
- Possibly add email to Watchdog if drive is unmounted too
long and a job is waiting on the drive.
- Use read_record.c in SD code.
-- Why don't we get an error message from Win32 FD when bootstrap
- file cannot be created for restore command?
-- When Marking a file in Restore that is a hard link, also
- mark the link so that the data will be reloaded.
- Restore program that errors in SD due to no tape reports
OK incorrectly in output.
- After unmount, if restore job started, ask to mount.
- Convert all %x substitution variables, which are hard to remember
and read to %(variable-name). Idea from TMDA.
-- Add JobLevel in FD status (but make sure it is defined).
-- Make Pool resource handle Counter resources.
- Remove NextId for SQLite. Optimize.
- Move all SQL statements into a single location.
- Add UA rc and history files.
- Enhance time and size scanning routines.
- Fix Autoprune for Volumes to respect need for full save.
- Fix Win32 config file definition name on /install
-- No READLINE_SRC if found in alternate directory.
-- Test a second language e.g. french.
- Compare tape to Client files (attributes, or attributes and data)
- Make all database Ids 64 bit.
- Write an applet for Linux.
- Add estimate to Console commands
-- Find solution to blank filename (i.e. path only) problem.
- Implement new daemon communications protocol.
-- Remove PoolId from Job table, it exists in Media.
- Allow console commands to detach or run in background.
- Fix status delay on storage daemon during rewind.
- Add SD message variables to control operator wait time
Verify level=Catalog, level=InitCatalog
- Events file
- Add keyword search to show command in Console.
-- Fix Win2000 error with no messages during startup.
- Events : tape has more than xxx bytes.
-- Restrict characters permitted in a Resource name.
- Complete code in Bacula Resources -- this will permit
reading a new config file at any time.
- Handle ctl-c in Console
- Implement script driven addition of File daemon to config files.
- Think about how to make Bacula work better with File (non-tape) archives.
- Write Unix emulator for Windows.
-- Implement new serialize subroutines
- send(socket, "string", &Vol, "uint32", &i, NULL)
-- Audit all UA commands to ensure that we always prompt where possible.
-- If ./btape is called without /dev, assume argument is a Storage resource name.
- Put memory utilization in Status output of each daemon
if full status requested or if some level of debug on.
- Make database type selectable by .conf files i.e. at runtime
- Set flag for uname -a. Add to Volume label.
- Implement throttled work queue.
-- Check for EOT at ENOSPC or EIO or ENXIO (unix Pc)
- Restore files modified after date
- Restore file modified before date
-- Emergency restore info:
- - Backup Bacula
- - Backup working directory
- - Backup Catalog
- Restore -- do nothing but show what would happen
- SET LD_RUN_PATH=$HOME/mysql/lib/mysql
- Implement Restore FileSet=
are concentrated.
- Remove duplicate fields from jcr (e.g. jcr.level and jcr.jr.Level, ...).
- Timout a job or terminate if link goes down, or reopen link and query.
-- Find general solution for sscanf size problems (as well
- as sprintf. Do at run time?
- Concept of precious tapes (cannot be reused).
- Make bcopy copy with a single tape drive.
- Permit changing ownership during restore.
+- From Phil:
+ > My suggestion: Add a feature on the systray menu-icon menu to request
+ > an immediate backup now. This would be useful for laptop users who may
+ > not be on the network when the regular scheduled backup is run.
+ >
+ > My wife's suggestion: Add a setting to the win32 client to allow it to
+ > shut down the machine after backup is complete (after, of course,
+ > displaying a "System will shut down in one minute, click here to cancel"
+ > warning dialog). This would be useful for sites that want user
+ > woorkstations to be shut down overnight to save power.
+ >
+
- Autolabel should be specified by DIR instead of SD.
-- Find out how to get the system tape block limits, e.g.:
- Apr 22 21:22:10 polymatou kernel: st1: Block limits 1 - 245760 bytes.
- Apr 22 21:22:10 polymatou kernel: st0: Block limits 2 - 16777214 bytes.
- Storage daemon
- Add media capacity
- AutoScan (check checksum of tape)
times out jobs by asking the deamons where they are.
- Enhance Jmsg code to permit buffering and saving to disk.
- device driver = "xxxx" for drives.
-- restart: paranoid: read label fsf to
- eom read append block, and go
- super-paranoid: read label, read all files
- in between, read append block, and go
- verify: backspace, read append block, and go
- permissive: same as above but frees drive
- if tape is not valid.
- Verify from Volume
- Ensure that /dev/null works
- Need report class for messages. Perhaps
fill in code for "since" option
- Director needs a time after which the report status is sent
anyway -- or better yet, a retry time for the job.
- Don't reschedule a job if previous incarnation is still running.
+- Don't reschedule a job if previous incarnation is still running.
- Some way to automatically backup everything is needed????
- Need a structure for pending actions:
- buffered messages
Longer term to do:
- Design at hierarchial storage for Bacula. Migration and Clone.
- Implement FSM (File System Modules).
-- Identify unchanged or "system" files and save them to a
- special tape thus removing them from the standard
- backup FileSet -- BASE backup.
- Audit M_ error codes to ensure they are correct and consistent.
- Add variable break characters to lex analyzer.
Either a bit mask or a string of chars so that
Item 2: Make the Storage daemon use intermediate file storage to buffer data.
-Deferred -- not necessary yet.
+Deferred -- not necessary yet -- possibly implement with Migration.
What: If data is coming into the SD too fast, buffer it to
disk if the user has configured this option.
Item 5: Implement Label templates
+Done
What: This is a mechanism whereby Bacula can automatically create
a tape label for new tapes according to a detailed specification
Item 9: Add SSL to daemon communications.
-Inprogress as of version 1.31.
What: This provides for secure communications between the daemons.
- Bug: fix access problems on files restored on WinXP.
- Put system type returned by FD into catalog.
- Finish WIN32_DATA stream code (bextract, check if can handle stream)
+- Make SD keep track of Files, Bytes during restore.
+- If you enter the userid by hand for restore, you get:
+ Enter JobId(s), comma separated, to restore: 74
+ You have selected the following JobId: 74
+ Building directory tree for JobId 74 ...
+ 134645140 items inserted into the tree and marked for extraction.
+- Add SDWriteSeqNo to SD, and probably Read on FD side.
+- If bootstrap is non-zero for restore, do not show JobId in the
+ OK to run? (yes/mod/no): list.
+- When all cassettes in magazine are used, got:
+ 22-May-2003 18:24 undef-sd: 3304 Autochanger "load slot 1" status is OK.
+ 22-May-2003 18:24 undef-sd: NightlySave.2003-05-22_14.08.16 Warning: mount.c:245 Director wanted Volume "TestVolume0009".
+ Current Volume "TestVolume0005" not acceptable because:
+ 1998 Volume "TestVolume0005" not Append or Recycle.
+ 22-May-2003 18:24 undef-sd: NightlySave.2003-05-22_14.08.16 Error: Autochanger Volume "TestVolume0009" not found in slot 1.
+ Setting slot to zero in catalog.
+ 22-May-2003 18:24 undef-sd: Please mount Volume "TestVolume0009" on Storage Device "ARCHIVE 4586" for Job NightlySave.2003-05-22_14
+ .08.16
+ Use "mount" command to release Job.
+ 22-May-2003 19:24 undef-sd: Please mount Volume "TestVolume0009" on Storage Device "ARCHIVE 4586" for Job NightlySave.2003-05-22_14
+ .08.16
+ Use "mount" command to release Job.
+- Don't zero the Slot when the wrong volume is found -- simply ask
+ the operator.
+- Implement MTIOCERRSTAT on FreeBSD to clear tape error conditions.
+- Shell expansion fails for working_directory in SD from time to time.
+- File the Automatically selected: xxx
+ to say Automatically selected Pool: xxx
+- Default duration with no qualifier is sec should be 1 day
+- zap sd_auth_key in SD after FD connection.
+- Find a solution for the multiple FileSet problem (when it is changed). Add date?
+- Look at Python for a Bacula scripting language -- www.python.org
+- When Marking a file in Restore that is a hard link, also
+ mark the link so that the data will be reloaded.
+- Emergency restore info:
+ - Backup Bacula
+ - Backup working directory
+ - Backup Catalog
+- Why don't we get an error message from Win32 FD when bootstrap
+ file cannot be created for restore command?
+- Fix Win2000 error with no messages during startup.
+- Make restore more robust in counting error and not immediately bailing
+ out. Also print error message once, but try to continue.
+- Add code to check that blocks are sequential on restore.
+- Remove "rufus" and such references from regress.
+- No READLINE_SRC if found in alternate directory.
+- If ./btape is called without /dev, assume argument is a Storage resource name.
+- Find general solution for sscanf size problems (as well as sprintf. Do at run time?
+- Bytes restored is wrong.
+- The "List last 20 Jobs run" doesnt work correctly in restore.
+ It doesnt show the last 20 jobs , but some older ones.
+- Fix Verify VolumeToCatalog to use BSRs -- it is broken.
+- Implement Release Storage=xxx
+- Fix restore on Win95/98
+- Remove the Jmsg() in sql_find.c:102 or only print on hard error.
+- Implement FileSet VolIndex -- done, but must update old records.
+- Check this below from Phil.
+ This was SD reported data rather than FD data!
+ > When the job was done, Bacula reported 11084 files restored:
+ >
+ > JobId: 527
+ > Job: Zocalo_Restore.2003-06-05_16.42.01
+ > Client: Zocalo
+ > Start time: 05-Jun-2003 16:42
+ > End time: 06-Jun-2003 01:21
+ > Files Restored: 11,084
+ > Bytes Restored: 65,474,772
+ > Rate: 2.1 KB/s
+ > FD termination status: OK
+ > Termination: Restore OK
+ >
+ > when it should probably have reported 11084 files scanned, 250 restored.
+ > The bytes restored count looks about right.
+ >
+- Should Bacula make an Append tape as Purged when purging?
+- Use switch() in backup.c and restore.c in FD instead of giant if statement.
+- If during a restore, a hard linked file already exists (on option), delete
+ the file and re-link it. This is to avoid the possibility that the
+ user had re-linked the file between the backup and the restore.
+ Do lstat() to see if it is already properly linked.
+ Same for symlinked file.
+ Make sure ifnewer, ifolder, never, ... apply correctly.
+- Flag so that no connect does not error, and Reschedule a job.
+- Implement "Reschedule OnError=yes interval=nnn times=xxx"
+- That restoring a hard link that already exists works correctly.
+ Same for soft link.
+- Make Pool resource handle Counter resources.
+- Fix first block number after label to be zero instead of 1 (reset after label).
+- Grep for Backup OK in regression script.
+- Do NOT reuse same JobId if tape written.
+- Implement non-blocking writes and bsock->terminate in heartbeat
+ thread, or set it in status.c cancel (used pthread_kill() instead of
+ non-blocking I/O.
+- Add restore to specific date.
+- Instrument use_count on DEVICE packets and ensure that the device is
+ being close()ed at the appropriate time.
+- Test long path names (>64 chars) in Windows -- crashes FD?
+- Implement fast block rejection: match_bsr_block().
+- Complain if record dropped in bnet_recv because too long.
+- Test multiple simultaneous Volumes
+- Document recycling algorithm.
+- Make Restore report an error if FD or SD term codes are not OK.
+- To link with mysqlclient_r may require -lssl -lcrypto
+- Document Heart beat code
+- Non-fatal errors are not correct counting attribs.c:277
+- Check that Block number in JobMedia are correct.
+- The bsr for Dan's job has file indexes covering the whole range rather
+ than only the range contained on the volume.
+ Constrain FileIndex to be within range for Volume.
+- Pass prefix_links to FD.
+- Fix restore list of volumes if Volume not selected.
+