Kern's ToDo List
- 16 June 2005
+ 27 April 2007
Major development:
Project Developer
======= =========
-TLS Landon Fuller
-Unicode in Win32 Thorsten Engel (done)
-VSS Thorsten Engel (in beta testing)
-Version 1.37 Kern (see below)
-========================================================
-
-1.37 Major Projects:
-#3 Migration (Move, Copy, Archive Jobs)
- (probably not this version)
-#7 Single Job Writing to Multiple Storage Devices
- (probably not this version)
-
-## Integrate web-bacula into a new Bacula project with
- bimagemgr.
-## Create a new GUI chapter explaining all the GUI programs.
-
-Autochangers:
-- 7. Implement new Console commands to allow offlining/reserving drives,
- and possibly manipulating the autochanger (much asked for).
-- Make "update slots" when pointing to Autochanger, remove
- all Volumes from other drives. "update slots all-drives"?
Document:
-- Document that Bootstrap files can be written with cataloging
- turned off.
-- Pruning with Admin job.
-- Add better documentation on how restores can be done
-- OS linux 2.4
- 1) ADIC, DLT, FastStor 4000, 7*20GB
- 2) Sun, DDS, (Suns name unknown - Archive Python DDS drive), 1.2GB
- 3) Wangtek, QIC, 6525ES, 525MB (fixed block size 1k, block size etc.
- driver dependent - aic7xxx works, ncr53c8xx with problems)
- 4) HP, DDS-2, C1553A, 6*4GB
-- Doc the following
- to activate, check or disable the hardware compression feature on my
- exb-8900 i use the exabyte "MammothTool" you can get it here:
- http://www.exabyte.com/support/online/downloads/index.cfm
- There is a solaris version of this tool. With option -C 0 or 1 you can
- disable or activate compression. Start this tool without any options for
- a small reference.
-- Linux Sony LIB-D81, AIT-3 library works.
-- Document PostgreSQL performance problems bug 131.
-- Document testing
-- Document that ChangerDevice is used for Alert command.
-- Document new CDROM directory.
-- Document Heartbeat Interval in the dealing with firewalls section.
+- !!! Cannot restore two jobs a the same time that were
+ written simultaneously unless they were totally spooled.
+- Document cleaning up the spool files:
+ db, pid, state, bsr, mail, conmsg, spool
- Document the multiple-drive-changer.txt script.
+- Pruning with Admin job.
+- Does WildFile match against full name? Doc.
+- %d and %v only valid on Director, not for ClientRunBefore/After.
+- During tests with the 260 char fix code, I found one problem:
+ if the system "sees" a long path once, it seems to forget it's
+ working drive (e.g. c:\), which will lead to a problem during
+ the next job (create bootstrap file will fail). Here is the
+ workaround: specify absolute working and pid directory in
+ bacula-fd.conf (e.g. c:\bacula\working instead of
+ \bacula\working).
+- Document techniques for restoring large numbers of files.
+- Document setting my.cnf to big file usage.
+- Add example of proper index output to doc. show index from File;
+- Correct the Include syntax in the m4.xxx files in examples/conf
+- Document JobStatus and Termination codes.
+- Fix the error with the "DVI file can't be opened" while
+ building the French PDF.
+- Document more DVD stuff
+- Doc
+ { "JobErrors", "i"},
+ { "JobFiles", "i"},
+ { "SDJobFiles", "i"},
+ { "SDErrors", "i"},
+ { "FDJobStatus","s"},
+ { "SDJobStatus","s"},
+- Document all the little details of setting up certificates for
+ the Bacula data encryption code.
+- Document more precisely how to use master keys -- especially
+ for disaster recovery.
+
+Professional Needs:
+- Migration from other vendors
+ - Date change
+ - Path change
+- Filesystem types
+- Backup conf/exe (all daemons)
+- Backup up system state
+- Detect state change of system (verify)
+
+Priority:
+- Please mount volume "xxx" on Storage device ... should also list
+ Pool and MediaType in case user needs to create a new volume.
+- Implement wait_for_sysop() message display in wait_for_device(), which
+now prints warnings too often.
+
+- the director seg faulted when I omitted the pool directive from a
+job resource. I was experimenting and thought it redundant that I had
+specified Pool, Full Backup Pool. and Differential Backup Pool. but
+apparently not. This happened when I removed the pool directive and
+started the director.
+
+- On restore add Restore Client, Original Client.
+01-Apr 00:42 rufus-dir: Start Backup JobId 55, Job=kernsave.2007-04-01_00.42.48
+01-Apr 00:42 rufus-sd: Python SD JobStart: JobId=55 Client=Rufus
+01-Apr 00:42 rufus-dir: Created new Volume "Full0001" in catalog.
+01-Apr 00:42 rufus-dir: Using Device "File"
+01-Apr 00:42 rufus-sd: kernsave.2007-04-01_00.42.48 Warning: Device "File" (/tmp) not configured to autolabel Volumes.
+01-Apr 00:42 rufus-sd: kernsave.2007-04-01_00.42.48 Warning: Device "File" (/tmp) not configured to autolabel Volumes.
+01-Apr 00:42 rufus-sd: Please mount Volume "Full0001" on Storage Device "File" (/tmp) for Job kernsave.2007-04-01_00.42.48
+01-Apr 00:44 rufus-sd: Wrote label to prelabeled Volume "Full0001" on device "File" (/tmp)
+
+- Add Where: client:/.... to restore job report.
+- Ensure that each device in an Autochanger has a different
+ Device Index.
+- Add Catalog = to Pool resource so that pools will exist
+ in only one catalog -- currently Pools are "global".
+- Look at sg_logs -a /dev/sg0 for getting soft errors.
+- btape "test" command with Offline on Unmount = yes
+
+ This test is essential to Bacula.
+
+ I'm going to write one record in file 0,
+ two records in file 1,
+ and three records in file 2
+
+ 02-Feb 11:00 btape: ABORTING due to ERROR in dev.c:715
+ dev.c:714 Bad call to rewind. Device "LTO" (/dev/nst0) not open
+ 02-Feb 11:00 btape: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation
+ Kaboom! btape, btape got signal 11. Attempting traceback.
+
+- Ensure that moving a purged Volume in ua_purge.c to the RecyclePool
+ does the right thing.
+- Why doesn't @"xxx abc" work in a conf file?
+- Figure out some way to "automatically" backup conf changes.
+- Add the OS version back to the Win32 client info.
+- Restarted jobs have a NULL in the from field.
+- Modify SD status command to indicate when the SD is writing
+ to a DVD (the device is not open -- see bug #732).
+- Look at the possibility of adding "SET NAMES UTF8" for MySQL,
+ and possibly changing the blobs into varchar.
+- Check if gnome-console works with TLS.
+- Ensure that the SD re-reads the Media record if the JobFiles
+ does not match -- it may have been updated by another job.
+- Look at moving the Storage directive from the Job to the
+ Pool in the default conf files.
+- Doc items
+- Test Volume compatibility between machine architectures
+- Encryption documentation
+- Wrong jobbytes with query 12 (todo)
+- bacula-1.38.2-ssl.patch
+- Bare-metal recovery Windows (todo)
+
+
+Projects:
+- GUI
+ - Admin
+ - Management reports
+ - Add doc for bweb -- especially Installation
+ - Look at Webmin
+ http://www.orangecrate.com/modules.php?name=News&file=article&sid=501
+- Performance
+ - FD-SD quick disconnect
+ - Despool attributes in separate thread
+ - Database speedups
+ - Embedded MySQL
+ - Check why restore repeatedly sends Rechdrs between
+ each data chunk -- according to James Harper 9Jan07.
+ - Building the in memory restore tree is slow.
+- Features
+ - Better scheduling
+ - Full at least once a month, ...
+ - Cancel Inc if Diff/Full running
+ - More intelligent re-run
+ - New/deleted file backup
+ - FD plugins
+ - Incremental backup -- rsync, Stow
+
+
+For next release:
+- Look at mondo/mindi
+- Don't restore Solaris Door files:
+ #define S_IFDOOR in st_mode.
+ see: http://docs.sun.com/app/docs/doc/816-5173/6mbb8ae23?a=view#indexterm-360
+- Make Bacula by default not backup tmpfs, procfs, sysfs, ...
+- Fix hardlinked immutable files when linking a second file, the
+ immutable flag must be removed prior to trying to link it.
+- Implement Python event for backing up/restoring a file.
+- Change dbcheck to tell users to use native tools for fixing
+ broken databases, and to ensure they have the proper indexes.
+- add udev rules for Bacula devices.
+- If a job terminates, the DIR connection can close before the
+ Volume info is updated, leaving the File count wrong.
+- Look at why SIGPIPE during connection can cause seg fault in
+ writing the daemon message, when Dir dropped to bacula:bacula
+- Look at zlib 32 => 64 problems.
+- Possibly turn on St. Bernard code.
+- Fix bextract to restore ACLs, or better yet, use common routines.
+- Do we migrate appendable Volumes?
+- Remove queue.c code.
+- Print warning message if LANG environment variable does not specify
+ UTF-8.
+- New dot commands from Arno.
+ .show device=xxx lists information from one storage device, including
+ devices (I'm not even sure that information exists in the DIR...)
+ .move eject device=xxx mostly the same as 'unmount xxx' but perhaps with
+ better machine-readable output like "Ok" or "Error busy"
+ .move eject device=xxx toslot=yyy the same as above, but with a new
+ target slot. The catalog should be updated accordingly.
+ .move transfer device=xxx fromslot=yyy toslot=zzz
+
+Low priority:
+- Article: http://www.heise.de/open/news/meldung/83231
+- Article: http://www.golem.de/0701/49756.html
+- Article: http://lwn.net/Articles/209809/
+- Article: http://www.onlamp.com/pub/a/onlamp/2004/01/09/bacula.html
+- Article: http://www.linuxdevcenter.com/pub/a/linux/2005/04/07/bacula.html
+- Article: http://www.osreviews.net/reviews/admin/bacula
+- Article: http://www.debianhelp.co.uk/baculaweb.htm
+- Article:
+- It appears to me that you have run into some sort of race
+ condition where two threads want to use the same Volume and they
+ were both given access. Normally that is no problem. However,
+ one thread wanted the particular Volume in drive 0, but it was
+ loaded into drive 1 so it decided to unload it from drive 1 and
+ then loaded it into drive 0, while the second thread went on
+ thinking that the Volume could be used in drive 1 not realizing
+ that in between time, it was loaded in drive 0.
+ I'll look at the code to see if there is some way we can avoid
+ this kind of problem. Probably the best solution is to make the
+ first thread simply start using the Volume in drive 1 rather than
+ transferring it to drive 0.
+- Fix re-read of last block to check if job has actually written
+ a block, and check if block was written by a different job
+ (i.e. multiple simultaneous jobs writing).
+- Figure out how to configure query.sql. Suggestion to use m4:
+ == changequote.m4 ===
+ changequote(`[',`]')dnl
+ ==== query.sql.in ===
+ :List next 20 volumes to expire
+ SELECT
+ Pool.Name AS PoolName,
+ Media.VolumeName,
+ Media.VolStatus,
+ Media.MediaType,
+ ifdef([MySQL],
+ [ FROM_UNIXTIME(UNIX_TIMESTAMP(Media.LastWritten) Media.VolRetention) AS Expire, ])dnl
+ ifdef([PostgreSQL],
+ [ media.lastwritten + interval '1 second' * media.volretention as expire, ])dnl
+ Media.LastWritten
+ FROM Pool
+ LEFT JOIN Media
+ ON Media.PoolId=Pool.PoolId
+ WHERE Media.LastWritten>0
+ ORDER BY Expire
+ LIMIT 20;
+ ====
+ Command: m4 -DmySQL changequote.m4 query.sql.in >query.sql
+
+ The problem is that it requires m4, which is not present on all machines
+ at ./configure time.
+- Given all the problems with FIFOs, I think the solution is to do something a
+ little different, though I will look at the code and see if there is not some
+ simple solution (i.e. some bug that was introduced). What might be a better
+ solution would be to use a FIFO as a sort of "key" to tell Bacula to read and
+ write data to a program rather than the FIFO. For example, suppose you
+ create a FIFO named:
+
+ /home/kern/my-fifo
+
+ Then, I could imagine if you backup and restore this file with a direct
+ reference as is currently done for fifos, instead, during backup Bacula will
+ execute:
+
+ /home/kern/my-fifo.backup
+
+ and read the data that my-fifo.backup writes to stdout. For restore, Bacula
+ will execute:
+
+ /home/kern/my-fifo.restore
+
+ and send the data backed up to stdout. These programs can either be an
+ executable or a shell script and they need only read/write to stdin/stdout.
+
+ I think this would give a lot of flexibility to the user without making any
+ significant changes to Bacula.
+
+
+==== SQL
+# get null file
+select FilenameId from Filename where Name='';
+# Get list of all directories referenced in a Backup.
+select Path.Path from Path,File where File.JobId=nnn and
+ File.FilenameId=(FilenameId-from-above) and File.PathId=Path.PathId
+ order by Path.Path ASC;
+
+- Look into using Dart for testing
+ http://public.kitware.com/Dart/HTML/Index.shtml
+
+- Look into replacing autotools with cmake
+ http://www.cmake.org/HTML/Index.html
+
+=== Migration from David ===
+What I'd like to see:
+
+Job {
+ Name = "<poolname>-migrate"
+ Type = Migrate
+ Messages = Standard
+ Pool = Default
+ Migration Selection Type = LowestUtil | OldestVol | PoolOccupancy |
+Client | PoolResidence | Volume | JobName | SQLquery
+ Migration Selection Pattern = "regexp"
+ Next Pool = <override>
+}
+
+There should be no need for a Level (migration is always Full, since you
+don't calculate differential/incremental differences for migration),
+Storage should be determined by the volume types in the pool, and Client
+is really a selection issue. Migration should always occur to the
+NextPool defined in the pool definition. If no nextpool is defined, the
+job should end with a reason of "no place to go". If Next Pool statement
+is present, we override the check in the pool definition and use the
+pool specified.
+
+Here's how I'd define Migration Selection Types:
+
+With Regexes:
+Client -- Migrate data from selected client only. Migration Selection
+Pattern regexp provides pattern to select client names, eg ^FS00* makes
+all client names starting with FS00 eligible for migration.
+
+Jobname -- Migration all jobs matching name. Migration Selection Pattern
+regexp provides pattern to select jobnames existing in pool.
+
+Volume -- Migrate all data on specified volumes. Migration Selection
+Pattern regexp provides selection criteria for volumes to be migrated.
+Volumes must exist in pool to be eligible for migration.
+
+
+With Regex optional:
+LowestUtil -- Identify the volume in the pool with the least data on it
+and empty it. No Migration Selection Pattern required.
+
+OldestVol -- Identify the LRU volume with data written, and empty it. No
+Migration Selection Pattern required.
+
+PoolOccupancy -- if pool occupancy exceeds <highmig>, migrate volumes
+(starting with most full volumes) until pool occupancy drops below
+<lowmig>. Pool highmig and lowmig values are in pool definition, no
+Migration Selection Pattern required.
-For 1.37:
-- Add # Job Level date to bsr file
-- Implement "PreferMountedVolumes = yes|no" in Job resource.
-=== rate design
- jcr->last_rate
- jcr->last_runtime
- MA = (last_MA * 3 + rate) / 4
- rate = (bytes - last_bytes) / (runtime - last_runtime)
-- Despool attributes simultaneously with data in a separate
- thread, rejoined at end of data spooling.
-- Implement Files/Bytes,... stats for restore job.
-- Implement Total Bytes Written, ... for restore job.
-- Add setting Volume State via Python.
-- Max Vols limit in Pool off by one?
-- Make bootstrap file handle multiple MediaTypes (SD)
-- Test restoring into a user restricted directory on Win32 -- see
- bug report.
-- --without-openssl breaks at least on Solaris.
+
+No regex:
+SQLQuery -- Migrate all jobuids returned by the supplied SQL query.
+Migration Selection Pattern contains SQL query to execute; should return
+a list of 1 or more jobuids to migrate.
+
+PoolResidence -- Migrate data sitting in pool for longer than
+PoolResidence value in pool definition. Migration Selection Pattern
+optional; if specified, override value in pool definition (value in
+minutes).
+
+
+[ possibly a Python event -- kes ]
+===
+- Mount on an Autochanger with no tape in the drive causes:
+ Automatically selected Storage: LTO-changer
+ Enter autochanger drive[0]: 0
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because:
+ Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found.
+ 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted.
+ If this is not a blank tape, try unmounting and remounting the Volume.
+- If Drive 0 is blocked, and drive 1 is set "Autoselect=no", drive 1 will
+ be used.
+- Autochanger did not change volumes.
+ select * from Storage;
+ +-----------+-------------+-------------+
+ | StorageId | Name | AutoChanger |
+ +-----------+-------------+-------------+
+ | 1 | LTO-changer | 0 |
+ +-----------+-------------+-------------+
+ 05-May 03:50 roxie-sd: 3302 Autochanger "loaded drive 0", result is Slot 11.
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Warning: Director wanted Volume "LT
+ Current Volume "LT0-002" not acceptable because:
+ 1997 Volume "LT0-002" not in catalog.
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Error: Autochanger Volume "LT0-002"
+ Setting InChanger to zero in catalog.
+ 05-May 03:50 roxie-dir: Tibs.2006-05-05_03.05.02 Error: Unable to get Media record
+
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Error getting Volume i
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Job 530 canceled.
+ 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: spool.c:249 Fatal appe
+ 05-May 03:49 Tibs: Tibs.2006-05-05_03.05.02 Fatal error: c:\cygwin\home\kern\bacula
+ , got
+ (missing)
+ llist volume=LTO-002
+ MediaId: 6
+ VolumeName: LTO-002
+ Slot: 0
+ PoolId: 1
+ MediaType: LTO-2
+ FirstWritten: 2006-05-05 03:11:54
+ LastWritten: 2006-05-05 03:50:23
+ LabelDate: 2005-12-26 16:52:40
+ VolJobs: 1
+ VolFiles: 0
+ VolBlocks: 1
+ VolMounts: 0
+ VolBytes: 206
+ VolErrors: 0
+ VolWrites: 0
+ VolCapacityBytes: 0
+ VolStatus:
+ Recycle: 1
+ VolRetention: 31,536,000
+ VolUseDuration: 0
+ MaxVolJobs: 0
+ MaxVolFiles: 0
+ MaxVolBytes: 0
+ InChanger: 0
+ EndFile: 0
+ EndBlock: 0
+ VolParts: 0
+ LabelType: 0
+ StorageId: 1
+
+ Note VolStatus is blank!!!!!
+ llist volume=LTO-003
+ MediaId: 7
+ VolumeName: LTO-003
+ Slot: 12
+ PoolId: 1
+ MediaType: LTO-2
+ FirstWritten: 0000-00-00 00:00:00
+ LastWritten: 0000-00-00 00:00:00
+ LabelDate: 2005-12-26 16:52:40
+ VolJobs: 0
+ VolFiles: 0
+ VolBlocks: 0
+ VolMounts: 0
+ VolBytes: 1
+ VolErrors: 0
+ VolWrites: 0
+ VolCapacityBytes: 0
+ VolStatus: Append
+ Recycle: 1
+ VolRetention: 31,536,000
+ VolUseDuration: 0
+ MaxVolJobs: 0
+ MaxVolFiles: 0
+ MaxVolBytes: 0
+ InChanger: 0
+ EndFile: 0
+ EndBlock: 0
+ VolParts: 0
+ LabelType: 0
+ StorageId: 1
+===
+ mount
+ Automatically selected Storage: LTO-changer
+ Enter autochanger drive[0]: 0
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3301 Issuing autochanger "loaded drive 0" command.
+ 3302 Autochanger "loaded drive 0", result: nothing loaded.
+ 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because:
+ Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found.
+
+ 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted.
+ If this is not a blank tape, try unmounting and remounting the Volume.
+
+- http://www.dwheeler.com/essays/commercial-floss.html
+- Add VolumeLock to prevent all but lock holder (SD) from updating
+ the Volume data (with the exception of VolumeState).
+- The btape fill command does not seem to use the Autochanger
+- Make Windows installer default to system disk drive.
+- Look at using ioctl(FIOBMAP, ...) on Linux, and
+ DeviceIoControl(..., FSCTL_QUERY_ALLOCATED_RANGES, ...) on
+ Win32 for sparse files.
+ http://www.flexhex.com/docs/articles/sparse-files.phtml
+ http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html
+- Directive: at <event> "command"
+- Command: pycmd "command" generates "command" event. How to
+ attach to a specific job?
+- Integrate Christopher's St. Bernard code.
+- run_cmd() returns int should return JobId_t
+- get_next_jobid_from_list() returns int should return JobId_t
+- Document export LDFLAGS=-L/usr/lib64
+- Don't attempt to restore from "Disabled" Volumes.
+- Network error on Win32 should set Win32 error code.
+- What happens when you rename a Disk Volume?
+- Job retention period in a Pool (and hence Volume). The job would
+ then be migrated.
+- Look at -D_FORTIFY_SOURCE=2
+- Add Win32 FileSet definition somewhere
+- Look at fixing restore status stats in SD.
+- Look at using ioctl(FIMAP) and FIGETBSZ for sparse files.
+ http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html
+- Implement a mode that says when a hard read error is
+ encountered, read many times (as it currently does), and if the
+ block cannot be read, skip to the next block, and try again. If
+ that fails, skip to the next file and try again, ...
+- Add level table:
+ create table LevelType (LevelType binary(1), LevelTypeLong tinyblob);
+ insert into LevelType (LevelType,LevelTypeLong) values
+ ("F","Full"),
+ ("D","Diff"),
+ ("I","Inc");
+- Show files/second in client status output.
+- Add a recursive mark command (rmark) to restore.
+- "Minimum Job Interval = nnn" sets minimum interval between Jobs
+ of the same level and does not permit multiple simultaneous
+ running of that Job (i.e. lets any previous invocation finish
+ before doing Interval testing).
+- Look at simplifying File exclusions.
+- New directive "Delete purged Volumes"
+- new pool XXX with ScratchPoolId = MyScratchPool's PoolId and
+ let it fill itself, and RecyclePoolId = XXX's PoolId so I can
+ see if it become stable and I just have to supervise
+ MyScratchPool
+- If I want to remove this pool, I set RecyclePoolId = MyScratchPool's
+ PoolId, and when it is empty remove it.
+- Figure out how to recycle Scratch volumes back to the Scratch Pool.
+- Add Volume=SCRTCH
+- Allow Check Labels to be used with Bacula labels.
+- "Resuming" a failed backup (lost line for example) by using the
+ failed backup as a sort of "base" job.
+- Look at NDMP
+- Email to the user when the tape is about to need changing x
+ days before it needs changing.
+- Command to show next tape that will be used for a job even
+ if the job is not scheduled.
+- From: Arunav Mandal <amandal@trolltech.com>
+ 1. When jobs are running and bacula for some reason crashes or if I do a
+ restart it remembers and jobs it was running before it crashed or restarted
+ as of now I loose all jobs if I restart it.
+
+ 2. When spooling and in the midway if client is disconnected for instance a
+ laptop bacula completely discard the spool. It will be nice if it can write
+ that spool to tape so there will be some backups for that client if not all.
+
+ 3. We have around 150 clients machines it will be nice to have a option to
+ upgrade all the client machines bacula version automatically.
+
+ 4. Atleast one connection should be reserved for the bconsole so at heavy load
+ I should connect to the director via bconsole which at sometimes I can't
+
+ 5. Another most important feature that is missing, say at 10am I manually
+ started backup of client abc and it was a full backup since client abc has
+ no backup history and at 10.30am bacula again automatically started backup of
+ client abc as that was in the schedule. So now we have 2 multiple Full
+ backups of the same client and if we again try to start a full backup of
+ client backup abc bacula won't complain. That should be fixed.
+
+- Fix bpipe.c so that it does not modify results pointer.
+ ***FIXME*** calling sequence should be changed.
+- For Windows disaster recovery see http://unattended.sf.net/
+- regardless of the retention period, Bacula will not prune the
+ last Full, Diff, or Inc File data until a month after the
+ retention period for the last Full backup that was done.
+- update volume=xxx --- add status=Full
+- Remove old spool files on startup.
+- Exclude SD spool/working directory.
+- Refuse to prune last valid Full backup. Same goes for Catalog.
- Python:
- Make a callback when Rerun failed levels is called.
- Give Python program access to Scheduled jobs.
+ - Add setting Volume State via Python.
- Python script to save with Python, not save, save with Bacula.
- Python script to do backup.
- What events?
- Change the Priority, Client, Storage, JobStatus (error)
at the start of a job.
- - Make sure that Python has access to Client address/port so that
- it can check if Clients are alive.
-
+- Why is SpoolDirectory = /home/bacula/spool; not reported
+ as an error when writing a DVD?
+- Make bootstrap file handle multiple MediaTypes (SD)
- Remove all old Device resource code in Dir and code to pass it
back in SD -- better, rework it to pass back device statistics.
- Check locking of resources -- be sure to lock devices where previously
resources were locked.
-- Add global lock on all devices when creating a device structure.
+- The last part is left in the spool dir.
+
-Maybe in 1.37:
+- In restore don't compare byte count on a raw device -- directory
+ entry does not contain bytes.
+=== rate design
+ jcr->last_rate
+ jcr->last_runtime
+ MA = (last_MA * 3 + rate) / 4
+ rate = (bytes - last_bytes) / (runtime - last_runtime)
+- Max Vols limit in Pool off by one?
+- Implement Files/Bytes,... stats for restore job.
+- Implement Total Bytes Written, ... for restore job.
+- Despool attributes simultaneously with data in a separate
+ thread, rejoined at end of data spooling.
+- 7. Implement new Console commands to allow offlining/reserving drives,
+ and possibly manipulating the autochanger (much asked for).
- Add start/end date editing in messages (%t %T, %e?) ...
- Add ClientDefs similar to JobDefs.
- Print more info when bextract -p accepts a bad block.
-- To mark files as deleted, run essentially a Verify to disk, and
- when a file is found missing (MarkId != JobId), then create
- a new File record with FileIndex == -1. This could be done
- by the FD at the same time as the backup.
- Fix FD JobType to be set before RunBeforeJob in FD.
- Look at adding full Volume and Pool information to a Volume
label so that bscan can get *all* the info.
- Bug: if a job is manually scheduled to run later, it does not appear
in any status report and cannot be cancelled.
+==== Keeping track of deleted/new files ====
+- To mark files as deleted, run essentially a Verify to disk, and
+ when a file is found missing (MarkId != JobId), then create
+ a new File record with FileIndex == -1. This could be done
+ by the FD at the same time as the backup.
+
+ My "trick" for keeping track of deletions is the following.
+ Assuming the user turns on this option, after all the files
+ have been backed up, but before the job has terminated, the
+ FD will make a pass through all the files and send their
+ names to the DIR (*exactly* the same as what a Verify job
+ currently does). This will probably be done at the same
+ time the files are being sent to the SD avoiding a second
+ pass. The DIR will then compare that to what is stored in
+ the catalog. Any files in the catalog but not in what the
+ FD sent will receive a catalog File entry that indicates
+ that at that point in time the file was deleted. This
+ either transmitted to the FD or simultaneously computed in
+ the FD, so that the FD can put a record on the tape that
+ indicates that the file has been deleted at this point.
+ A delete file entry could potentially be one with a FileIndex
+ of 0 or perhaps -1 (need to check if FileIndex is used for
+ some other thing as many of the Bacula fields are "overloaded"
+ in the SD).
+
+ During a restore, any file initially picked up by some
+ backup (Full, ...) then subsequently having a File entry
+ marked "delete" will be removed from the tree, so will not
+ be restored. If a file with the same name is later OK it
+ will be inserted in the tree -- this already happens. All
+ will be consistent except for possible changes during the
+ running of the FD.
+
+ Since I'm on the subject, some of you may be wondering what
+ the utility of the in memory tree is if you are going to
+ restore everything (at least it comes up from time to time
+ on the list). Well, it is still *very* useful because it
+ allows only the last item found for a particular filename
+ (full path) to be entered into the tree, and thus if a file
+ is backed up 10 times, only the last copy will be restored.
+ I recently (last Friday) restored a complete directory, and
+ the Full and all the Differential and Incremental backups
+ spanned 3 Volumes. The first Volume was not even mounted
+ because all the files had been updated and hence backed up
+ since the Full backup was made. In this case, the tree
+ saved me a *lot* of time.
+
+ Make sure this information is stored on the tape too so
+ that it can be restored directly from the tape.
+
+ All the code (with the exception of formally generating and
+ saving the delete file entries) already exists in the Verify
+ Catalog command. It explicitly recognizes added/deleted files since
+ the last InitCatalog. It is more or less a "simple" matter of
+ taking that code and adapting it slightly to work for backups.
+
+ Comments from Martin Simmons (I think they are all covered):
+ Ok, that should cover the basics. There are few issues though:
+
+ - Restore will depend on the catalog. I think it is better to include the
+ extra data in the backup as well, so it can be seen by bscan and bextract.
+
+ - I'm not sure if it will preserve multiple hard links to the same inode. Or
+ maybe adding or removing links will cause the data to be dumped again?
+
+ - I'm not sure if it will handle renamed directories. Possibly it will work
+ by dumping the whole tree under a renamed directory?
+
+ - It remains to be seen how the backup performance of the DIR's will be
+ affected when comparing the catalog for a large filesystem.
+
+====
+From David:
+How about introducing a Type = MgmtPolicy job type? That job type would
+be responsible for scanning the Bacula environment looking for specific
+conditions, and submitting the appropriate jobs for implementing said
+policy, eg:
+
+Job {
+ Name = "Migration-Policy"
+ Type = MgmtPolicy
+ Policy Selection Job Type = Migrate
+ Scope = "<keyword> <operator> <regexp>"
+ Threshold = "<keyword> <operator> <regexp>"
+ Job Template = <template-name>
+}
+
+Where <keyword> is any legal job keyword, <operator> is a comparison
+operator (=,<,>,!=, logical operators AND/OR/NOT) and <regexp> is a
+appropriate regexp. I could see an argument for Scope and Threshold
+being SQL queries if we want to support full flexibility. The
+Migration-Policy job would then get scheduled as frequently as a site
+felt necessary (suggested default: every 15 minutes).
+
+Example:
+
+Job {
+ Name = "Migration-Policy"
+ Type = MgmtPolicy
+ Policy Selection Job Type = Migration
+ Scope = "Pool=*"
+ Threshold = "Migration Selection Type = LowestUtil"
+ Job Template = "MigrationTemplate"
+}
+
+would select all pools for examination and generate a job based on
+MigrationTemplate to automatically select the volume with the lowest
+usage and migrate it's contents to the nextpool defined for that pool.
+
+This policy abstraction would be really handy for adjusting the behavior
+of Bacula according to site-selectable criteria (one thing that pops
+into mind is Amanda's ability to automatically adjust backup levels
+depending on various criteria).
+
+
+=====
+
Regression tests:
- Add Pool/Storage override regression test.
- Add delete JobId to regression.
block numbers in btape "test". Possibly adjust in Bacula.
- Fix list volumes to output volume retention in some other
units, perhaps via a directive.
-- If opening a tape in read/write mode fails attempt to open
- it in read-only mode, and mark the tape for read only.
- Allow Simultaneous Priorities = yes => run up to Max concurrent jobs even
with multiple priorities.
- If you use restore replace=never, the directory attributes for
- see lzma401.zip in others directory for new compression
algorithm/library.
-- Minimal autochanger handling in Bacula and in btape.
-- Look into how tar does not save sockets and the possiblity of
- not saving them in Bacula (Martin Simmons reported this).
-- Fix restore jobs so that multiple jobs can run if they
- are not using the same tape(s).
- Allow the user to select JobType for manual pruning/purging.
- bscan does not put first of two volumes back with all info in
bscan-test.
are not restored. See bug 213. To fix this requires creating a
list of newly restored directories so that those directory
permissions *can* be restored.
-- Compaction of Disk space by "migrating" Volumes that have pruned
- Jobs (what criteria? size, #jobs, time).
- Add prune all command
- Document fact that purge can destroy a part of a restore by purging
one volume while others remain valid -- perhaps mark Jobs.
- Add tree pane to left of window.
- Add progress meter.
- Max wait time or max run time causes seg fault -- see runtime-bug.txt
-- Document writing to a CD/DVD with Bacula.
-- Add a "base" package to the window installer for pthreadsVCE.dll
- which is needed by all packages.
- Add message to user to check for fixed block size when the forward
space test fails in btape.
- When unmarking a directory check if all files below are unmarked and
- Setup lrrd graphs: (http://www.linpro.no/projects/lrrd/) Mike Acar.
- Revisit the question of multiple Volumes (disk) on a single device.
- Add a block copy option to bcopy.
-- Investigate adding Mac Resource Forks.
- Finish work on Gnome restore GUI.
- Fix "llist jobid=xx" where no fileset or client exists.
- For each job type (Admin, Restore, ...) require only the really necessary
to start a job or pass its DHCP obtained IP number.
- Implement a query tape prompt/replace feature for a console
- Copy console @ code to gnome2-console
-- Make AES the only encryption algorithm see
- http://csrc.nist.gov/CryptoToolkit/aes/). It's
- an officially adopted standard, has survived peer
- review, and provides keys up to 256 bits.
-- Take a careful look at SetACL http://setacl.sourceforge.net
- Make tree walk routines like cd, ls, ... more user friendly
by handling spaces better.
- Make sure that Bacula rechecks the tape after the 20 min wait.
in the "short" pool to the "long" pool if this pool runs out of volume
space?
- What to do about "list files job=xxx".
-- Get and test MySQL 4.0
- Look at how fuser works and /proc/PID/fd that is how Nic found the
file descriptor leak in Bacula.
- Implement WrapCounters in Counters.
run the job but don't save the files.
- Make things like list where a file is saved case independent for
Windows.
-- Implement migrate
- Use autochanger to handle multiple devices.
-- On Windows with very long path names, it may be impossible to create
- a file (and thus restore it) because the total length is too long.
- We must cd into the directory then create the file without the
- full path name.
- Implement a Recycle command
-- Test a second language e.g. french.
- Start working on Base jobs.
- Implement UnsavedFiles DB record.
- From Phil Stracchino:
- If SD cannot open a drive, make it periodically retry.
- Add more of the config info to the tape label.
-- If tape is marked read-only, then try opening it read-only rather than
- failing, and remember that it cannot be written.
- Refine SD waiting output:
Device is being positioned
> Device is being positioned for append
- Compare tape to Client files (attributes, or attributes and data)
- Make all database Ids 64 bit.
- Allow console commands to detach or run in background.
-- Fix status delay on storage daemon during rewind.
- Add SD message variables to control operator wait time
- Maximum Operator Wait
- Minimum Message Interval
Migration: Move a backup from one Volume to another
Clone: Copy a backup -- two Volumes
-Bacula Migration is based on Jobs (apparently Networker is file by file).
-
-Migration triggered by:
- Number of Jobs
- Number of Volumes
- Age of Jobs
- Highwater mark (keep total size)
- Lowwater mark
-
-
======================================================
Base Jobs design
=== Done
-- Save mount point for directories not traversed with onefs=yes.
-- Add seconds to start and end times in the Job report output.
-- if 2 concurrent backups are attempted on the same tape
- drive (autoloader) into different tape pools, one of them will exit
- fatally instead of halting until the drive is idle
-- Update StartTime if job held in Job Queue.
-- Look at www.nu2.nu/pebuilder as a helper for full windows
- bare metal restore. (done by Scott)
-- Fix orphanned buffers:
- Orphaned buffer: 24 bytes allocated at line 808 of rufus-dir job.c
- Orphaned buffer: 40 bytes allocated at line 45 of rufus-dir alist.c
-- Implement Preben's suggestion to add
- File System Types = ext2, ext3
- to FileSets, thus simplifying backup of *all* local partitions.
-- Try to open a device on each Job if it was not opened
- when the SD started.
-- Add dump of VolSessionId/Time and FileIndex with bls.
-- If Bacula does not find the right tape in the Autochanger,
- then mark the tape in error and move on rather than asking
- for operator intervention.
-- Cancel command should include JobId in list of Jobs.
-- Add performance testing hooks
-- Bootstrap from JobMedia records.
-- Implement WildFile and WildDir to solve problem of
- saving only *.doc files.
-- Fix
- Please use the "label" command to create a new Volume for:
- Storage: DDS-4-changer
- Media type:
- Pool: Default
- label
- The defined Storage resources are:
-- Copy Changer Device and Changer Command from Autochanger
- to Device resource in SD if none given in Device resource.
-- 1. Automatic use of more than one drive in an autochanger (done)
-- 2. Automatic selection of the correct drive for each Job (i.e.
- selects a drive with an appropriate Volume for the Job) (done)
-- 6. Allow multiple simultaneous Jobs referencing the same pool write
- to several tapes (some new directive(s) are are probably needed for
- this) (done)
-- Locking (done)
-- Key on Storage rather than Pool (done)
-- Allow multiple drives to use same Pool (change jobq.c DIR) (done).
-- Synchronize multiple drives so that not more
- than one loads a tape and any time (done)
-- 4. Use Changer Device and Changer Command specified in the
- Autochanger resource, if none is found in the Device resource.
- You can continue to specify them in the Device resource if you want
- or need them to be different for each device.
-- 5. Implement a new Device directive (perhaps "Autoselect = yes/no")
- that can allow a Device be part of an Autochanger, and hence the changer
- script protected, but if set to no, will prevent the Device from being
- automatically selected from the changer. This allows the device to
- be directly accessed through its Device name, but not through the
- AutoChanger name.
-#6 Select one from among Multiple Storage Devices for Job
-#5 Events that call a Python program
- (Implemented in Dir/SD)
-- Make sure the Device name is in the Query packet returned.
-- Don't start a second file job if one is already running.
-- Implement EOF/EOV labels for ANSI labels
-- Implement IBM labels.
-- When Python creates a new label, the tape is immediately
- recycled and no label created. This happens when using
- autolabeling -- even when Python doesn't generate the name.
-- Scratch Pool where the volumes can be re-assigned to any Pool.
-- 28-Mar 23:19 rufus-sd: acquire.c:379 Device "DDS-4" (/dev/nst0)
- is busy reading. Job 6 canceled.
-- Remove separate thread for opening devices in SD. On the other
- hand, don't block waiting for open() for devices.
-- Fix code to either handle updating NumVol or to calculate it in
- Dir next_vol.c
-- Ensure that you cannot exclude a directory or a file explicitly
- Included with File.
-#4 Embedded Python Scripting
- (Implemented in Dir/SD/FD)
-- Add Python writable variable for changing the Priority,
- Client, Storage, JobStatus (error), ...
-- SD Python
- - Solicit Events
-- Add disk seeking on restore; turn off seek on tapes.
- stored/match_bsr.c
-- Look at dird_conf.c:1000: warning: `int size'
- might be used uninitialized in this function
-- Indicate when a Job is purged/pruned during restore.
-- Implement some way to turn off automatic pruning in Jobs.
-- Implement a way an Admin Job can prune, possibly multiple
- clients -- Python script?
-- Look at Preben's acl.c error handling code.
-- SD crashes after a tape restore then doing a backup.
-- If drive is opened read/write, close it and re-open
- read-only if doing a restore, and vice-versa.
-- Windows restore:
- data-fd: RestoreFiles.2004-12-07_15.56.42 Error:
- > ..\findlib\../../findlib/create_file.c:275 Could not open e:/: ERR=Der
- > Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen
- > Prozess verwendet wird.
- Restore restores all files, but then fails at the end trying
- to set the attributes of e:
- from failed jobs.- Resolve the problem between Device name and Archive name,
- and fix SD messages.
-- Tell the "restore" user when browsing is no longer possible.
-- Add a restore directory-x
-- Write non-optimized bsrs from the JobMedia and Media records,
- even after Files are pruned.
-- Delete Stripe and Copy from VolParams to save space.
-- Fix option 2 of restore -- list where file is backed up -- require Client,
- then list last 20 backups.
-- Finish implementation of passing all Storage and Device needs to
- the SD.
-- Move test for max wait time exceeded in job.c up -- Peter's idea.
-## Consider moving docs to their own project.
-## Move rescue to its own project.
-- Add client version to the Client name line that prints in
- the Job report.
-- Fix the Rescue CDROM.
-- By the way: on page http://www.bacula.org/?page=tapedrives , at the
- bottom, the link to "Tape Testing Chapter" is broken. It goes to
- /html-manual/... while the others point to /rel-manual/...
-- Device resource needs the "name" of the SD.
-- Specify a single directory to restore.
-- Implement MediaType keyword in bsr?
-- Add a date and time stamp at the beginning of every line in the
- Job report (Volker Sauer).
-- Add level to estimate command.
-- Add "limit=n" for "list jobs"
-- Make bootstrap filename unique.
-- Make Dmsg look at global before calling subroutine.
-- From Chris Hull:
- it seems to be complaining about 12:00pm which should be a valid 12
- hour time. I changed the time to 11:59am and everything works fine.
- Also 12:00am works fine. 0:00pm also works (which I don't think
- should). None of the values 12:00pm - 12:59pm work for that matter.
-- Require restore via the restore command or make a restore Job
- get the bootstrap file.
-- Implement Maximum Job Spool Size
-- Fix 3993 error in SD. It forgets to look at autochanger
- resource for device command, ...
-- 3. Prevent two drives requesting the same Volume in any given
- autochanger, by checking if a Volume is mounted on another drive
- in an Autochanger.
-- Upgrade to MySQL 4.1.12 See:
- http://dev.mysql.com/doc/mysql/en/Server_SQL_mode.html
+- Why the heck doesn't bacula drop root priviledges before connecting to
+ the DB?
+- Look at using posix_fadvise(2) for backups -- see bug #751.
+ Possibly add the code at findlib/bfile.c:795
+/* TCP socket options */
+#define TCP_NODELAY 1 /* Turn off Nagle's algorithm. */
+#define TCP_MAXSEG 2 /* Limit MSS */
+#define TCP_CORK 3 /* Never send partially complete segments */
+#define TCP_KEEPIDLE 4 /* Start keeplives after this period */
+#define TCP_KEEPINTVL 5 /* Interval between keepalives */
+#define TCP_KEEPCNT 6 /* Number of keepalives before death */
+#define TCP_SYNCNT 7 /* Number of SYN retransmits */
+#define TCP_LINGER2 8 /* Life time of orphaned FIN-WAIT-2 state */
+#define TCP_DEFER_ACCEPT 9 /* Wake up listener only when data arrive */
+#define TCP_WINDOW_CLAMP 10 /* Bound advertised window */
+#define TCP_INFO 11 /* Information about this connection. */
+#define TCP_QUICKACK 12 /* Block/reenable quick acks */
+#define TCP_CONGESTION 13 /* Congestion control algorithm */
+- Fix bnet_connect() code to set a timer and to use time to
+ measure the time.
+- Implement 4th argument to make_catalog_backup that passes hostname.
+- Test FIFO backup/restore -- make regression