From 44f50b8dc65a335c45dac412d08848dbfce75ce3 Mon Sep 17 00:00:00 2001 From: Kern Sibbald Date: Sat, 26 Jan 2013 11:46:08 +0100 Subject: [PATCH] Remove old todo --- bacula/kernstodo | 1391 ---------------------------------------------- 1 file changed, 1391 deletions(-) delete mode 100644 bacula/kernstodo diff --git a/bacula/kernstodo b/bacula/kernstodo deleted file mode 100644 index 358c430f52..0000000000 --- a/bacula/kernstodo +++ /dev/null @@ -1,1391 +0,0 @@ - Kern's ToDo List - 21 September 2009 - -Rescue: -Add to USB key: - gftp sshfs kile kate lsssci m4 mtx nfs-common nfs-server - patch squashfs-tools strace sg3-utils screen scsiadd - system-tools-backend telnet dpkg traceroute urar usbutils - whois apt-file autofs busybox chkrootkit clamav dmidecode - manpages-dev manpages-posix manpages-posix-dev - - -Document: -- package sg3-utils, program sg_map -- !!! Cannot restore two jobs a the same time that were - written simultaneously unless they were totally spooled. -- Document cleaning up the spool files: - db, pid, state, bsr, mail, conmsg, spool -- Document the multiple-drive-changer.txt script. -- Pruning with Admin job. -- Does WildFile match against full name? Doc. -- %d and %v only valid on Director, not for ClientRunBefore/After. -- During tests with the 260 char fix code, I found one problem: - if the system "sees" a long path once, it seems to forget it's - working drive (e.g. c:\), which will lead to a problem during - the next job (create bootstrap file will fail). Here is the - workaround: specify absolute working and pid directory in - bacula-fd.conf (e.g. c:\bacula\working instead of - \bacula\working). -- Document techniques for restoring large numbers of files. -- Document setting my.cnf to big file usage. -- Correct the Include syntax in the m4.xxx files in examples/conf -- Document all the little details of setting up certificates for - the Bacula data encryption code. -- Document more precisely how to use master keys -- especially - for disaster recovery. - -Priority: -================ -24-Jul 09:56 rufus-fd JobId 1: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE) -24-Jul 09:56 rufus-fd JobId 1: Warning: VSS Writer (BackupComplete): "ASR Writer", State: 0x8 (VSS_WS_FAILED_AT_PREPARE_SNAPSHOT) -24-Jul 09:56 rufus-fd JobId 1: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE) -- Add external command to lookup hostname (eg nmblookup timmy-win7) -nmblookup gato -querying gato on 127.255.255.255 -querying gato on 192.168.1.255 - 192.168.1.8 gato<00> - 192.168.1.11 gato<00> - 192.168.1.8 gato<00> - 192.168.1.11 gato<00> -- Possibly allow SD to spool even if a tape is not mounted. -- How to sync remote offices. -- Windows Bare Metal -- Backup up windows system state -- Complete Job restart -- Look at rsysnc for incremental updates and dedupping -- Implement rwlock() for SD that takes why and can_steal to replace - existing block/lock mechanism. rlock() would allow multiple readers - wlock would allow only one writer. -- For Windows disaster recovery see http://unattended.sf.net/ -- Add "before=" "olderthan=" to FileSet for doing Base of - unchanged files. -- Show files/second in client status output. -- Don't attempt to restore from "Disabled" Volumes. -- Have SD compute MD5 or SHA1 and compare to what FD computes. -- Make VolumeToCatalog calculate an MD5 or SHA1 from the - actual data on the Volume and compare it. -- Remove queue.c code. -- Implement multiple jobid specification for the cancel command, - similar to what is permitted on the update slots command. -- Ensure that the SD re-reads the Media record if the JobFiles - does not match -- it may have been updated by another job. -- Add MD5 or SHA1 check in SD for data validation -- When reserving a device to read, check to see if the Volume - is already in use, if so wait. Probably will need to pass the - Volume. See bug #1313. Create a regression test to simulate - this problem and see if VolumePollInterval fixes it. Possibly turn - it on by default. - -- Page hash tables -- Deduplication -- Why no error message if restore has no permission on the where - directory? -- Possibly allow manual "purge" to purge a Volume that has not - yet been written (even if FirstWritten time is zero) see ua_purge.c - is_volume_purged(). -- Add disk block detection bsr code (make it work). -- Remove done bsrs. -- Detect deadlocks in reservations. -- Plugins: - - Add list during dump - - Add in plugin code flag - - Add bRC_EndJob -- stops more calls to plugin this job - - Add bRC_Term (unload plugin) - - remove time_t from Jmsg and use utime_t? -- Deadlock detection, watchdog sees if counter advances when jobs are - running. With debug on, can do a "status" command. -- User options for plugins. -- Pool Storage override precedence over command line. -- Autolabel only if Volume catalog information indicates tape not - written. This will avoid overwriting a tape that gets an I/O - error on reading the volume label. -- I/O error, SD thinks it is not the right Volume, should check slot - then disable volume, but Asks for mount. -- Can be posible modify package to create and use configuration files in - the Debian manner? - - For example: - - /etc/bacula/bacula-dir.conf - /etc/bacula/conf.d/pools.conf - /etc/bacula/conf.d/clients.conf - /etc/bacula/conf.d/storages.conf - - and into bacula-dir.conf file include - - @/etc/bacula/conf.d/pools.conf - @/etc/bacula/conf.d/clients.conf - @/etc/bacula/conf.d/storages.conf -- Possibly add an Inconsistent state when a Volume is in error - for non I/O reasons. -- Fix #ifdefing so that smartalloc can be disabled. Check manual - -- the default is enabled. -- Dangling softlinks are not restored properly. For example, take a - soft link such as src/testprogs/install-sh, which points to /usr/share/autoconf... - move the directory to another machine where the file /usr/share/autoconf does - not exist, back it up, then try a full restore. It fails. -- Softlinks that point to non-existent file are not restored in restore all, - but are restored if the file is individually selected. BUG! -- Prune by Job -- Prune by Job Level (Full, Differential, Incremental) -- modify pruning to keep a fixed number of versions of a file, - if requested. -- the cd-command should allow complete paths - i.e. cd /foo/bar/foo/bar - -> if a customer mails me the path to a certain file, - its faster to enter the specified directory -- Make tree walk routines like cd, ls, ... more user friendly - by handling spaces better. -- When doing a restore, if the user does an "update slots" - after the job started in order to add a restore volume, the - values prior to the update slots will be put into the catalog. - Must retrieve catalog record merge it then write it back at the - end of the restore job, if we want to do this right. -=== rate design - jcr->last_rate - jcr->last_runtime - MA = (last_MA * 3 + rate) / 4 - rate = (bytes - last_bytes) / (runtime - last_runtime) -=== -- Add a recursive mark command (rmark) to restore. -- Look at simplifying File exclusions. -- Scripts -- Separate Files and Directories in catalog -- Create FileVersions table -- finish implementation of fdcalled -- see ua_run.c:105 -- Fix problem in postgresql.c in my_postgresql_query, where the - generation of the error message doesn't differentiate result==NULL - and a bad status from that result. Not only that, the result is - cleared on a bail_out without having generated the error message. -- Implement SDErrors (must return from SD) -- Implement continue spooling while despooling. -- Remove all install temp files in Win32 PLUGINSDIR. -- No where in restore causes kaboom. -- Performance: multiple spool files for a single job. -- Performance: despool attributes when despooling data (problem - multiplexing Dir connection). -- Implement wait_for_sysop() message display in wait_for_device(), which - now prints warnings too often. -- Ensure that each device in an Autochanger has a different - Device Index. -- Look at sg_logs -a /dev/sg0 for getting soft errors. -- btape "test" command with Offline on Unmount = yes - - This test is essential to Bacula. - - I'm going to write one record in file 0, - two records in file 1, - and three records in file 2 - - 02-Feb 11:00 btape: ABORTING due to ERROR in dev.c:715 - dev.c:714 Bad call to rewind. Device "LTO" (/dev/nst0) not open - 02-Feb 11:00 btape: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation - Kaboom! btape, btape got signal 11. Attempting traceback. - -- Encryption -- email from Landon - > The backup encryption algorithm is currently not configurable, and is - > set to AES_128_CBC in src/filed/backup.c. The encryption code - > supports a number of different ciphers (as well as adding arbitrary - > new ones) -- only a small bit of code would be required to map a - > configuration string value to a CRYPTO_CIPHER_* value, if anyone is - > interested in implementing this functionality. - -- Add the OS version back to the Win32 client info. -- Restarted jobs have a NULL in the from field. -- Modify SD status command to indicate when the SD is writing - to a DVD (the device is not open -- see bug #732). -- Look at the possibility of adding "SET NAMES UTF8" for MySQL, - and possibly changing the blobs into varchar. -- Test Volume compatibility between machine architectures -- Encryption documentation - -Professional Needs: -- Migration from other vendors - - Date change - - Path change -- Filesystem types -- Backup conf/exe (all daemons) -- Detect state change of system (verify) -- SD to SD -- Novell NSS backup http://www.novell.com/coolsolutions/tools/18952.html -- Compliance norms that compare restored code hash code. -- David's priorities - Copypools - Extract capability (#25) - Threshold triggered migration jobs (not currently in list, but will be - needed ASAP) - Client triggered backups - Complete rework of the scheduling system (not in list) - Performance and usage instrumentation (not in list) - See email of 21Aug2007 for details. -- Look at: http://tech.groups.yahoo.com/group/cfg2html - and http://www.openeyet.nl/scc/ for managing customer changes - -Projects: -- Pool enhancements - - Access Mode = Read-Only, Read-Write, Unavailable, Destroyed, Offsite - - Pool Type = Copy - - Maximum number of scratch volumes - - Maximum File size - - Next Pool (already have) - - Reclamation threshold - - Reclamation Pool - - Reuse delay (after all files purged from volume before it can be used) - - Copy Pool = xx, yyy (or multiple lines). - - Catalog = xxx - - Allow pool selection during restore. - -- Average tape size from Eric - SELECT COALESCE(media_avg_size.volavg,0) * count(Media.MediaId) AS volmax, GROUP BY Media.MediaType, Media.PoolId, media_avg_size.volavg - count(Media.MediaId) AS volnum, - sum(Media.VolBytes) AS voltotal, - Media.PoolId AS PoolId, - Media.MediaType AS MediaType - FROM Media - LEFT JOIN (SELECT avg(Media.VolBytes) AS volavg, - Media.MediaType AS MediaType - FROM Media - WHERE Media.VolStatus = 'Full' - GROUP BY Media.MediaType - ) AS media_avg_size ON (Media.MediaType = media_avg_size.MediaType) - GROUP BY Media.MediaType, Media.PoolId, media_avg_size.volavg -- Performance - - Despool attributes in separate thread - - Database speedups - - Embedded MySQL - - Check why restore repeatedly sends Rechdrs between - each data chunk -- according to James Harper 9Jan07. -- Features - - Better scheduling - - More intelligent re-run - - Incremental backup -- rsync, Stow - -- Make Bacula by default not backup tmpfs, procfs, sysfs, ... -- Fix hardlinked immutable files when linking a second file, the - immutable flag must be removed prior to trying to link it. -- Change dbcheck to tell users to use native tools for fixing - broken databases, and to ensure they have the proper indexes. -- add udev rules for Bacula devices. -- If a job terminates, the DIR connection can close before the - Volume info is updated, leaving the File count wrong. -- Look at why SIGPIPE during connection can cause seg fault in - writing the daemon message, when Dir dropped to bacula:bacula -- Look at zlib 32 => 64 problems. -- Fix bextract to restore ACLs, or better yet, use common routines. -- New dot commands from Arno. - .show device=xxx lists information from one storage device, including - devices (I'm not even sure that information exists in the DIR...) - .move eject device=xxx mostly the same as 'unmount xxx' but perhaps with - better machine-readable output like "Ok" or "Error busy" - .move eject device=xxx toslot=yyy the same as above, but with a new - target slot. The catalog should be updated accordingly. - .move transfer device=xxx fromslot=yyy toslot=zzz - -Low priority: -- Article: http://www.heise.de/open/news/meldung/83231 -- Article: http://www.golem.de/0701/49756.html -- Article: http://lwn.net/Articles/209809/ -- Article: http://www.onlamp.com/pub/a/onlamp/2004/01/09/bacula.html -- Article: http://www.linuxdevcenter.com/pub/a/linux/2005/04/07/bacula.html -- Article: http://www.osreviews.net/reviews/admin/bacula -- Article: http://www.debianhelp.co.uk/baculaweb.htm -- Article: -- Wikis mentioning Bacula - http://wiki.finkproject.org/index.php/Admin:Backups - http://wiki.linuxquestions.org/wiki/Bacula - http://www.openpkg.org/product/packages/?package=bacula - http://www.iterating.com/products/Bacula - http://net-snmp.sourceforge.net/wiki/index.php/Net-snmp_extensions - http://www.section6.net/wiki/index.php/Using_Bacula_for_Tape_Backups - http://bacula.darwinports.com/ - http://wiki.mandriva.com/en/Releases/Corporate/Server_4/Notes#Bacula - http://en.wikipedia.org/wiki/Bacula - -- Bacula Wikis - http://www.devco.net/pubwiki/Bacula/ - http://paramount.ind.wpi.edu/wiki/doku.php - http://gentoo-wiki.com/HOWTO_Backup - http://www.georglutz.de/wiki/Bacula - http://www.clarkconnect.com/wiki/index.php?title=Modules_-_LAN_Backup/Recovery - http://linuxwiki.de/Bacula (in German) - -- Figure out how to configure query.sql. Suggestion to use m4: - == changequote.m4 === - changequote(`[',`]')dnl - ==== query.sql.in === - :List next 20 volumes to expire - SELECT - Pool.Name AS PoolName, - Media.VolumeName, - Media.VolStatus, - Media.MediaType, - ifdef([MySQL], - [ FROM_UNIXTIME(UNIX_TIMESTAMP(Media.LastWritten) Media.VolRetention) AS Expire, ])dnl - ifdef([PostgreSQL], - [ media.lastwritten + interval '1 second' * media.volretention as expire, ])dnl - Media.LastWritten - FROM Pool - LEFT JOIN Media - ON Media.PoolId=Pool.PoolId - WHERE Media.LastWritten>0 - ORDER BY Expire - LIMIT 20; - ==== - Command: m4 -DmySQL changequote.m4 query.sql.in >query.sql - - The problem is that it requires m4, which is not present on all machines - at ./configure time. - -==== SQL -# get null file -select FilenameId from Filename where Name=''; -# Get list of all directories referenced in a Backup. -select Path.Path from Path,File where File.JobId=nnn and - File.FilenameId=(FilenameId-from-above) and File.PathId=Path.PathId - order by Path.Path ASC; - -- Mount on an Autochanger with no tape in the drive causes: - Automatically selected Storage: LTO-changer - Enter autochanger drive[0]: 0 - 3301 Issuing autochanger "loaded drive 0" command. - 3302 Autochanger "loaded drive 0", result: nothing loaded. - 3301 Issuing autochanger "loaded drive 0" command. - 3302 Autochanger "loaded drive 0", result: nothing loaded. - 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because: - Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found. - 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. - If this is not a blank tape, try unmounting and remounting the Volume. -- If Drive 0 is blocked, and drive 1 is set "Autoselect=no", drive 1 will - be used. -- Autochanger did not change volumes. - select * from Storage; - +-----------+-------------+-------------+ - | StorageId | Name | AutoChanger | - +-----------+-------------+-------------+ - | 1 | LTO-changer | 0 | - +-----------+-------------+-------------+ - 05-May 03:50 roxie-sd: 3302 Autochanger "loaded drive 0", result is Slot 11. - 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Warning: Director wanted Volume "LT - Current Volume "LT0-002" not acceptable because: - 1997 Volume "LT0-002" not in catalog. - 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Error: Autochanger Volume "LT0-002" - Setting InChanger to zero in catalog. - 05-May 03:50 roxie-dir: Tibs.2006-05-05_03.05.02 Error: Unable to get Media record - - 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Error getting Volume i - 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Job 530 canceled. - 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: spool.c:249 Fatal appe - 05-May 03:49 Tibs: Tibs.2006-05-05_03.05.02 Fatal error: c:\cygwin\home\kern\bacula - , got - (missing) - llist volume=LTO-002 - MediaId: 6 - VolumeName: LTO-002 - Slot: 0 - PoolId: 1 - MediaType: LTO-2 - FirstWritten: 2006-05-05 03:11:54 - LastWritten: 2006-05-05 03:50:23 - LabelDate: 2005-12-26 16:52:40 - VolJobs: 1 - VolFiles: 0 - VolBlocks: 1 - VolMounts: 0 - VolBytes: 206 - VolErrors: 0 - VolWrites: 0 - VolCapacityBytes: 0 - VolStatus: - Recycle: 1 - VolRetention: 31,536,000 - VolUseDuration: 0 - MaxVolJobs: 0 - MaxVolFiles: 0 - MaxVolBytes: 0 - InChanger: 0 - EndFile: 0 - EndBlock: 0 - VolParts: 0 - LabelType: 0 - StorageId: 1 - - Note VolStatus is blank!!!!! - llist volume=LTO-003 - MediaId: 7 - VolumeName: LTO-003 - Slot: 12 - PoolId: 1 - MediaType: LTO-2 - FirstWritten: 0000-00-00 00:00:00 - LastWritten: 0000-00-00 00:00:00 - LabelDate: 2005-12-26 16:52:40 - VolJobs: 0 - VolFiles: 0 - VolBlocks: 0 - VolMounts: 0 - VolBytes: 1 - VolErrors: 0 - VolWrites: 0 - VolCapacityBytes: 0 - VolStatus: Append - Recycle: 1 - VolRetention: 31,536,000 - VolUseDuration: 0 - MaxVolJobs: 0 - MaxVolFiles: 0 - MaxVolBytes: 0 - InChanger: 0 - EndFile: 0 - EndBlock: 0 - VolParts: 0 - LabelType: 0 - StorageId: 1 -=== - mount - Automatically selected Storage: LTO-changer - Enter autochanger drive[0]: 0 - 3301 Issuing autochanger "loaded drive 0" command. - 3302 Autochanger "loaded drive 0", result: nothing loaded. - 3301 Issuing autochanger "loaded drive 0" command. - 3302 Autochanger "loaded drive 0", result: nothing loaded. - 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because: - Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found. - - 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. - If this is not a blank tape, try unmounting and remounting the Volume. - -- http://www.dwheeler.com/essays/commercial-floss.html -- Add VolumeLock to prevent all but lock holder (SD) from updating - the Volume data (with the exception of VolumeState). -- The btape fill command does not seem to use the Autochanger -- Make Windows installer default to system disk drive. -- Look at using ioctl(FIOBMAP, ...) on Linux, and - DeviceIoControl(..., FSCTL_QUERY_ALLOCATED_RANGES, ...) on - Win32 for sparse files. - http://www.flexhex.com/docs/articles/sparse-files.phtml - http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html -- Directive: at "command" -- Command: pycmd "command" generates "command" event. How to - attach to a specific job? -- run_cmd() returns int should return JobId_t -- get_next_jobid_from_list() returns int should return JobId_t -- Document export LDFLAGS=-L/usr/lib64 -- Network error on Win32 should set Win32 error code. -- What happens when you rename a Disk Volume? -- Job retention period in a Pool (and hence Volume). The job would - then be migrated. -- Add Win32 FileSet definition somewhere -- Look at fixing restore status stats in SD. -- Look at using ioctl(FIMAP) and FIGETBSZ for sparse files. - http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html -- Implement a mode that says when a hard read error is - encountered, read many times (as it currently does), and if the - block cannot be read, skip to the next block, and try again. If - that fails, skip to the next file and try again, ... -- Add level table: - create table LevelType (LevelType binary(1), LevelTypeLong tinyblob); - insert into LevelType (LevelType,LevelTypeLong) values - ("F","Full"), - ("D","Diff"), - ("I","Inc"); -- new pool XXX with ScratchPoolId = MyScratchPool's PoolId and - let it fill itself, and RecyclePoolId = XXX's PoolId so I can - see if it become stable and I just have to supervise - MyScratchPool -- If I want to remove this pool, I set RecyclePoolId = MyScratchPool's - PoolId, and when it is empty remove it. -- Add Volume=SCRTCH -- Allow Check Labels to be used with Bacula labels. -- "Resuming" a failed backup (lost line for example) by using the - failed backup as a sort of "base" job. -- Look at NDMP -- Email to the user when the tape is about to need changing x - days before it needs changing. -- Command to show next tape that will be used for a job even - if the job is not scheduled. -- From: Arunav Mandal - 1. When jobs are running and bacula for some reason crashes or if I do a - restart it remembers and jobs it was running before it crashed or restarted - as of now I loose all jobs if I restart it. - - 2. When spooling and in the midway if client is disconnected for instance a - laptop bacula completely discard the spool. It will be nice if it can write - that spool to tape so there will be some backups for that client if not all. - - 3. We have around 150 clients machines it will be nice to have a option to - upgrade all the client machines bacula version automatically. - - 4. Atleast one connection should be reserved for the bconsole so at heavy load - I should connect to the director via bconsole which at sometimes I can't - - 5. Another most important feature that is missing, say at 10am I manually - started backup of client abc and it was a full backup since client abc has - no backup history and at 10.30am bacula again automatically started backup of - client abc as that was in the schedule. So now we have 2 multiple Full - backups of the same client and if we again try to start a full backup of - client backup abc bacula won't complain. That should be fixed. - -- regardless of the retention period, Bacula will not prune the - last Full, Diff, or Inc File data until a month after the - retention period for the last Full backup that was done. -- update volume=xxx --- add status=Full -- Remove old spool files on startup. -- Exclude SD spool/working directory. -- Refuse to prune last valid Full backup. Same goes for Catalog. -- Python: - - Make a callback when Rerun failed levels is called. - - Give Python program access to Scheduled jobs. - - Add setting Volume State via Python. - - Python script to save with Python, not save, save with Bacula. - - Python script to do backup. - - What events? - - Change the Priority, Client, Storage, JobStatus (error) - at the start of a job. -- Why is SpoolDirectory = /home/bacula/spool; not reported - as an error when writing a DVD? -- Make bootstrap file handle multiple MediaTypes (SD) -- Remove all old Device resource code in Dir and code to pass it - back in SD -- better, rework it to pass back device statistics. -- Check locking of resources -- be sure to lock devices where previously - resources were locked. -- The last part is left in the spool dir. - - -- In restore don't compare byte count on a raw device -- directory - entry does not contain bytes. - - -- Max Vols limit in Pool off by one? -- Implement Files/Bytes,... stats for restore job. -- Implement Total Bytes Written, ... for restore job. -- Despool attributes simultaneously with data in a separate - thread, rejoined at end of data spooling. -- 7. Implement new Console commands to allow offlining/reserving drives, - and possibly manipulating the autochanger (much asked for). -- Add start/end date editing in messages (%t %T, %e?) ... -- Add ClientDefs similar to JobDefs. -- Print more info when bextract -p accepts a bad block. -- Fix FD JobType to be set before RunBeforeJob in FD. -- Look at adding full Volume and Pool information to a Volume - label so that bscan can get *all* the info. -- If the user puts "Purge Oldest Volume = yes" or "Recycle Oldest Volume = yes" - and there is only one volume in the pool, refuse to do it -- otherwise - he fills the Volume, then immediately starts reusing it. -- Implement copies and stripes. -- Add history file to console. -- Each file on tape creates a JobMedia record. Peter has 4 million - files spread over 10000 tape files and four tapes. A restore takes - 16 hours to build the restore list. -- Add and option to check if the file size changed during backup. -- Make sure SD deletes spool files on error exit. -- Delete old spool files when SD starts. -- When labeling tapes, if you enter 000026, Bacula uses - the tape index rather than the Volume name 000026. -- Add offline tape command to Bacula console. -- Bug: - Enter MediaId or Volume name: 32 - Enter new Volume name: DLT-20Dec04 - Automatically selected Pool: Default - Connecting to Storage daemon DLTDrive at 192.168.68.104:9103 ... - Sending relabel command from "DLT-28Jun03" to "DLT-20Dec04" ... - block.c:552 Write error at 0:0 on device /dev/nst0. ERR=Bad file descriptor. - Error writing final EOF to tape. This tape may not be readable. - dev.c:1207 ioctl MTWEOF error on /dev/nst0. ERR=Permission denied. - askdir.c:219 NULL Volume name. This shouldn't happen!!! - 3912 Failed to label Volume: ERR=dev.c:1207 ioctl MTWEOF error on /dev/nst0. ERR=Permission denied. - Label command failed for Volume DLT-20Dec04. - Do not forget to mount the drive!!! -- Bug: if a job is manually scheduled to run later, it does not appear - in any status report and cannot be cancelled. - - -==== -From David: -How about introducing a Type = MgmtPolicy job type? That job type would -be responsible for scanning the Bacula environment looking for specific -conditions, and submitting the appropriate jobs for implementing said -policy, eg: - -Job { - Name = "Migration-Policy" - Type = MgmtPolicy - Policy Selection Job Type = Migrate - Scope = " " - Threshold = " " - Job Template = -} - -Where is any legal job keyword, is a comparison -operator (=,<,>,!=, logical operators AND/OR/NOT) and is a -appropriate regexp. I could see an argument for Scope and Threshold -being SQL queries if we want to support full flexibility. The -Migration-Policy job would then get scheduled as frequently as a site -felt necessary (suggested default: every 15 minutes). - -Example: - -Job { - Name = "Migration-Policy" - Type = MgmtPolicy - Policy Selection Job Type = Migration - Scope = "Pool=*" - Threshold = "Migration Selection Type = LowestUtil" - Job Template = "MigrationTemplate" -} - -would select all pools for examination and generate a job based on -MigrationTemplate to automatically select the volume with the lowest -usage and migrate it's contents to the nextpool defined for that pool. - -This policy abstraction would be really handy for adjusting the behavior -of Bacula according to site-selectable criteria (one thing that pops -into mind is Amanda's ability to automatically adjust backup levels -depending on various criteria). - - -===== - -Regression tests: -- Add Pool/Storage override regression test. -- Add delete JobId to regression. -- Add a regression test for dbcheck. -- New test to add bscan to four-concurrent-jobs regression, - i.e. after the four-concurrent jobs zap the - database as is done in the bscan-test, then use bscan to - restore the database, do a restore and compare with the - original. -- Add restore of specific JobId to regression (item 3 - on the restore prompt) -- Add IPv6 to regression -- Add database test to regression. Test each function like delete, - purge, ... - -- AntiVir can slow down backups on Win32 systems. -- Win32 systems with FAT32 can be much slower than NTFS for - more than 1000 files per directory. - - -1.37 Possibilities: -- A HOLD command to stop all jobs from starting. -- A PAUSE command to pause all running jobs ==> release the - drive. -- Media Type = LTO,LTO-2,LTO-3 - Media Type Read = LTO,LTO2,LTO3 - Media Type Write = LTO2, LTO3 - -=== From Carsten Menke - -Following is a list of what I think in the situations where I'm faced with, -could be a usefull enhancement to bacula, which I'm certain other users will -benefit from as well. - -1. NextJob/NextJobs Directive within a Job Resource in the form of - NextJobs = job1,job2. - - Why: - I currently solved the problem with running multiple jobs each after each - by setting the Max Wait Time for a job to 8 hours, and give - the jobs different Priorities. However, there scenarios where - 1 Job is directly depending on another job, so if the former job fails, - the job after it needn't to be run - while maybe other jobs should run despite of that - -Example: - A Backup Job and a Verify job, if the backup job fails there is no need to run - the verify job, as the backup job already failed. However, one may like - to backup the Catalog to disk despite of that the main backup job failed. - -Notes: - I see that this is related to the Event Handlers which are on the ToDo - list, also it is maybe a good idea to check for the return value and - execute different actions based on the return value - - -3. offline capability to bconsole - - Why: - Currently I use a script which I execute within the last Job via the - RunAfterJob Directive, to release and eject the tape. - So I have to call bconsole "release=Storage-Name" and afterwards - mt -f /dev/nst0 eject to get the tape out. - - If I have multiple Storage Devices, than these may not be /dev/nst0 and - I have to modify the script or call it with parameters etc. - This would actually not be needed, as everything is already defined - in bacula-sd.conf and if I can invoke bconsole with the - storage name via $1 in the script than I'm done and information is - not duplicated. - -4. %s for Storage Name added to the chars being substituted in "RunAfterJob" - - Why: - - For the reason mentioned in 3. to have the ability to call a - script with /scripts/foobar %s and in the script use $1 - to pass the Storage Name to bconsole - -5. Setting Volume State within a Job Resource - - Why: - Instead of using "Maximum Volume Jobs" in the Pool Resource, - I would have the possibilty to define - in a Job Resource that after this certain job is run, the Volume State - should be set to "Volume State = Used", this give more flexibility (IMHO). - -7. OK, this is evil, probably bound to security risks and maybe not possible - due to the design of bacula. - - Implementation of Backtics ( `command` ) for shell comand execution to - the "Label Format" Directive. - -Why: - - Currently I have defined BACULA_DAY_OF_WEEK="day1|day2..." resulting in - Label Format = "HolyBackup-${BACULA_DAY_OF_WEEK[${WeekDay}]}". If I could - use backticks than I could use "Label Format = HolyBackup-`date +%A` to have - the localized name for the day of the week appended to the - format string. Then I have the tape labeled automatically with weekday - name in the correct language. -========== -- Make output from status use html table tags for nicely - presenting in a browser. -- Browse generations of files. -- I've seen an error when my catalog's File table fills up. I - then have to recreate the File table with a larger maximum row - size. Relevant information is at - http://dev.mysql.com/doc/mysql/en/Full_table.html ; I think the - "Installing and Configuring MySQL" chapter should talk a bit - about this potential problem, and recommend a solution. -- Want speed of writing to tape while despooling. -- Supported autochanger: -OS: Linux -Man.: HP -Media: LTO-2 -Model: SSL1016 -Slots: 16 -Cap: 200GB -- Supported drive: - Wangtek 6525ES (SCSI-1 QIC drive, 525MB), under Linux 2.4.something, - bacula 1.36.0/1 works with blocksize 16k INSIDE bacula-sd.conf. -- Add regex from http://www.pcre.org to Bacula for Win32. -- Use only shell tools no make in CDROM package. -- Include within include does it work? -- Implement a Pool of type Cleaning? -- Think about making certain database errors fatal. -- Look at correcting the time jump in the scheduler for daylight - savings time changes. -- Check dates entered by user for correctness (month/day/... ranges) -- Compress restore Volume listing by date and first file. -- Look at patches/bacula_db.b2z postgresql that loops during restore. - See Gregory Wright. -- Perhaps add read/write programs and/or plugins to FileSets. -- How to handle backing up portables ... -- Limit bandwidth - -Documentation to do: (any release a little bit at a time) -- Doc to do unmount before removing magazine. -- Alternative to static linking "ldd prog" save all binaries listed, - restore them and point LD_LIBRARY_PATH to them. -- Document add "/dev/null 2>&1" to the bacula-fd command line -- Document query file format. -- Add more documentation for bsr files. -- Document problems with Verify and pruning. -- Document how to use multiple databases. -- VXA drives have a "cleaning required" - indicator, but Exabyte recommends preventive cleaning after every 75 - hours of operation. - From Phil: - In this context, it should be noted that Exabyte has a command-line - vxatool utility available for free download. (The current version is - vxatool-3.72.) It can get diagnostic info, read, write and erase tapes, - test the drive, unload tapes, change drive settings, flash new firmware, - etc. - Of particular interest in this context is that vxatool -i will - report, among other details, the time since last cleaning in tape motion - minutes. This information can be retrieved (and settings changed, for - that matter) through the generic-SCSI device even when Bacula has the - regular tape device locked. (Needless to say, I don't recommend - changing tape settings while a job is running.) -- Lookup HP cleaning recommendations. -- Lookup HP tape replacement recommendations (see trouble shooting autochanger) -- Document doing table repair - - -=================================== -- Add macro expansions in JobDefs. - Run Before Job = "SomeFile %{Level} %{Client}" - Write Bootstrap="/some/dir/%{JobName}_%{Client}.bsr" -- Use non-blocking network I/O but if no data is available, use - select(). -- Use gather write() for network I/O. -- Autorestart on crash. -- Add bandwidth limiting. -- When an error in input occurs and conio beeps, you can back - up through the prompt. -- Detect fixed tape block mode during positioning by looking at - block numbers in btape "test". Possibly adjust in Bacula. -- Fix list volumes to output volume retention in some other - units, perhaps via a directive. -- If you use restore replace=never, the directory attributes for - non-existent directories will not be restored properly. - -- see lzma401.zip in others directory for new compression - algorithm/library. -- Allow the user to select JobType for manual pruning/purging. -- bscan does not put first of two volumes back with all info in - bscan-test. -- Figure out how to make named console messages go only to that - console and to the non-restricted console (new console class?). -- Make restricted console prompt for password if *ask* is set or - perhaps if password is undefined. -- Implement "from ISO-date/time every x hours/days/weeks/months" in - schedules. - -==== from Marc Schoechlin -- the help-command should be more verbose - (it should explain the paramters of the different - commands in detail) - -> its time-comsuming to consult the manual anytime - you need a special parameter - -> maybe its more easy to maintain this, if the - descriptions of that commands are outsourced to - a ceratin-file -- if the password is not configured in bconsole.conf - you should be asked for it. - -> sometimes you like to do restore on a customer-machine - which shouldnt know the password for bacula. - -> adding the password to the file favours admins - to forget to remove the password after usage - -> security-aspects - the protection of that file is less important -- long-listed-output of commands should be scrollable - like the unix more/less-command does - -> if someone runs 200 and more machines, the lists could - be a little long and complex -- command-output should be shown column by column - to reduce scrolling and to increase clarity - -> see last item -- lsmark should list the selected files with full - paths -- wildcards for selecting and file and directories would be nice -- any actions should be interuptable with STRG+C -- command-expansion would be pretty cool -==== -- When the replace Never option is set, new directory permissions - are not restored. See bug 213. To fix this requires creating a - list of newly restored directories so that those directory - permissions *can* be restored. -- Add prune all command -- Document fact that purge can destroy a part of a restore by purging - one volume while others remain valid -- perhaps mark Jobs. -- Add multiple-media-types.txt -- look at mxt-changer.html -- Make ? do a help command (no return needed). -- Implement restore directory. -- Document streams and how to implement them. -- Try not to re-backup a file if a new hard link is added. -- Add feature to backup hard links only, but not the data. -- Fix stream handling to be simpler. -- Add Priority and Bootstrap to Run a Job. -- Eliminate Restore "Run Restore Job" prompt by allowing new "run command - to be issued" -- Remove View FileSet button from Run a Job dialog. -- Handle prompt for restore job at end of Restore command. -- Add display of total selected files to Restore window. -- Add tree pane to left of window. -- Add progress meter. -- Max wait time or max run time causes seg fault -- see runtime-bug.txt -- Add message to user to check for fixed block size when the forward - space test fails in btape. -- When unmarking a directory check if all files below are unmarked and - then remove the + flag -- in the restore tree. -- Possibly implement: Action = Unmount Device="TapeDrive1" in Admin jobs. -- Setup lrrd graphs: (http://www.linpro.no/projects/lrrd/) Mike Acar. -- Revisit the question of multiple Volumes (disk) on a single device. -- Add a block copy option to bcopy. -- Fix "llist jobid=xx" where no fileset or client exists. -- For each job type (Admin, Restore, ...) require only the really necessary - fields.- Pass Director resource name as an option to the Console. -- Add a "batch" mode to the Console (no unsolicited queries, ...). -- Allow browsing the catalog to see all versions of a file (with - stat data on each file). -- Restore attributes of directory if replace=never set but directory - did not exist. -- Use SHA1 on authentication if possible. -- See comtest-xxx.zip for Windows code to talk to USB. -- Add John's appended files: - Appended = { /files/server/logs/http/*log } - and such files would be treated as follows.On a FULL backup, they would - be backed up like any other file.On an INCREMENTAL backup, where a - previous INCREMENTAL or FULL was already in thecatalogue and the length - of the file wasgreater than the length of the last backup, only thedata - added since the last backup will be dumped.On an INCREMENTAL backup, if - the length of the file is less than thelength of the file with the same - name last backed up, the completefile is dumped.On Windows systems, with - creation date of files, we can be evensmarter about this and not count - entirely upon the length.On a restore, the full and all incrementals - since it will beapplied in sequence to restore the file. -- Check new HAVE_WIN32 open bits. -- Check if the tape has moved before writing. -- Handling removable disks -- see below: -- Add FromClient and ToClient keywords on restore command (or - BackupClient RestoreClient). -- Implement a JobSet, which groups any number of jobs. If the - JobSet is started, all the jobs are started together. - Allow Pool, Level, and Schedule overrides. -- Look at updating Volume Jobs so that Max Volume Jobs = 1 will work - correctly for multiple simultaneous jobs. -- Implement the Media record flag that indicates that the Volume does disk - addressing. -- Fix fast block rejection (stored/read_record.c:118). It passes a null - pointer (rec) to try_repositioning(). -- Implement RestoreJobRetention? Maybe better "JobRetention" in a Job, - which would take precidence over the Catalog "JobRetention". -- Implement Label Format in Add and Label console commands. -- Put email tape request delays on one or more variables. User wants - to cancel the job after a certain time interval. Maximum Mount Wait? -- Job, Client, Device, Pool, or Volume? - Is it possible to make this a directive which is *optional* in multiple - resources, like Level? If so, I think I'd make it an optional directive - in Job, Client, and Pool, with precedence such that Job overrides Client - which in turn overrides Pool. - -- New Storage specifications: - - Want to write to multiple storage devices simultaneously - - Want to write to multiple storage devices sequentially (in one job) - - Want to read/write simultaneously - - Key is MediaType -- it must match - - Passed to SD as a sort of BSR record called Storage Specification - Record or SSR. - SSR - Next -> Next SSR - MediaType -> Next MediaType - Pool -> Next Pool - Device -> Next Device - Job Resource - Allow multiple Storage specifications - New flags - One Archive = yes - One Device = yes - One Storage = yes - One MediaType = yes - One Pool = yes - Storage - Allow Multiple Pool specifications (note, Pool currently - in Job resource). - Allow Multiple MediaType specifications in Dir conf - Allow Multiple Device specifications in Dir conf - Perhaps keep this in a single SSR - Tie a Volume to a specific device by using a MediaType that - is contained in only one device. - In SD allow Device to have Multiple MediaTypes - -- Ideas from Jerry Scharf: - First let's point out some big pluses that bacula has for this - it's open source - more importantly it's active. Thank you so much for that - even more important, it's not flaky - it has an open access catalog, opening many possibilities - it's pushing toward heterogeneous systems capability - big things: - Macintosh file client - working bare iron recovery for windows - the option for inc/diff backups not reset on fileset revision - a) use both change and inode update time against base time - b) do the full catalog check (expensive but accurate) - sizing guide (how much system is needed to back up N systems/files) - consultants on using bacula in building a disaster recovery system - an integration guide - or how to get at fancy things that one could do with bacula - logwatch code for bacula logs (or similar) - support for Oracle database ?? -=== -- Look at adding SQL server and Exchange support for Windows. -- Add progress of files/bytes to SD and FD. -- do a "messages" before the first prompt in Console -- Client does not show busy during Estimate command. -- Implement Console mtx commands. -- Implement a Mount Command and an Unmount Command where - the users could specify a system command to be performed - to do the mount, after which Bacula could attempt to - read the device. This is for Removeable media such as a CDROM. - - Most likely, this mount command would be invoked explicitly - by the user using the current Console "mount" and "unmount" - commands -- the Storage Daemon would do the right thing - depending on the exact nature of the device. - - As with tape drives, when Bacula wanted a new removable - disk mounted, it would unmount the old one, and send a message - to the user, who would then use "mount" as described above - once he had actually inserted the disk. -- Implement dump/print label to UA -- Spool to disk only when the tape is full, then when a tape is hung move - it to tape. -- bextract is sending everything to the log file ****FIXME**** -- Allow multiple Storage specifications (or multiple names on - a single Storage specification) in the Job record. Thus a job - can be backed up to a number of storage devices. -- Implement some way for the File daemon to contact the Director - to start a job or pass its DHCP obtained IP number. -- Implement a query tape prompt/replace feature for a console -- Make sure that Bacula rechecks the tape after the 20 min wait. -- Set IO_NOWAIT on Bacula TCP/IP packets. -- Try doing a raw partition backup and restore by mounting a - Windows partition. -- From Lars Kellers: - Yes, it would allow to highly automatic the request for new tapes. If a - tape is empty, bacula reads the barcodes (native or simulated), and if - an unused tape is found, it runs the label command with all the - necessary parameters. - - By the way can bacula automatically "move" an empty/purged volume say - in the "short" pool to the "long" pool if this pool runs out of volume - space? -- What to do about "list files job=xxx". -- Look at how fuser works and /proc/PID/fd that is how Nic found the - file descriptor leak in Bacula. -- Can we dynamically change FileSets? -- If pool specified to label command and Label Format is specified, - automatically generate the Volume name. -- Add ExhautiveRestoreSearch -- Look at the possibility of loading only the necessary - data into the restore tree (i.e. do it one directory at a - time as the user walks through the tree). -- Possibly use the hash code if the user selects all for a restore command. -- Fix "restore all" to bypass building the tree. -- Prohibit backing up archive device (findlib/find_one.c:128) -- Implement Release Device in the Job resource to unmount a drive. -- Implement Acquire Device in the Job resource to mount a drive, - be sure this works with admin jobs so that the user can get - prompted to insert the correct tape. Possibly some way to say to - run the job but don't save the files. -- Make things like list where a file is saved case independent for - Windows. -- Implement a Recycle command -- From Phil Stracchino: - It would probably be a per-client option, and would be called - something like, say, "Automatically purge obsoleted jobs". What it - would do is, when you successfully complete a Differential backup of a - client, it would automatically purge all Incremental backups for that - client that are rendered redundant by that Differential. Likewise, - when a Full backup on a client completed, it would automatically purge - all Differential and Incremental jobs obsoleted by that Full backup. - This would let people minimize the number of tapes they're keeping on - hand without having to master the art of retention times. -- When doing a Backup send all attributes back to the Director, who - would then figure out what files have been deleted. -- Currently in mount.c:236 the SD simply creates a Volume. It should have - explicit permission to do so. It should also mark the tape in error - if there is an error. -- Cancel waiting for Client connect in SD if FD goes away. - -- Implement timeout in response() when it should come quickly. -- Implement a Slot priority (loaded/not loaded). -- Implement "vacation" Incremental only saves. -- Implement create "FileSet"? -- Add prefixlinks to where or not where absolute links to FD. -- Issue message to mount a new tape before the rewind. -- Simplified client job initiation for portables. -- If SD cannot open a drive, make it periodically retry. -- Add more of the config info to the tape label. - -- Refine SD waiting output: - Device is being positioned - > Device is being positioned for append - > Device is being positioned to file x - > -- Figure out some way to estimate output size and to avoid splitting - a backup across two Volumes -- this could be useful for writing CDROMs - where you really prefer not to have it split -- not serious. -- Make bcopy read through bad tape records. -- Program files (i.e. execute a program to read/write files). - Pass read date of last backup, size of file last time. -- Add Signature type to File DB record. -- CD into subdirectory when open()ing files for backup to - speed up things. Test with testfind(). -- Priority job to go to top of list. -- Why are save/restore of device different sizes (sparse?) Yup! Fix it. -- Implement some way for the Console to dynamically create a job. -- Solaris -I on tar for include list -- Need a verbose mode in restore, perhaps to bsr. -- bscan without -v is too quiet -- perhaps show jobs. -- Add code to reject whole blocks if not wanted on restore. -- Check if we can increase Bacula FD priorty in Win2000 -- Check if both CatalogFiles and UseCatalog are set to SD. -- Possibly add email to Watchdog if drive is unmounted too - long and a job is waiting on the drive. -- After unmount, if restore job started, ask to mount. -- Add UA rc and history files. -- put termcap (used by console) in ./configure and - allow -with-termcap-dir. -- Fix Autoprune for Volumes to respect need for full save. -- Compare tape to Client files (attributes, or attributes and data) -- Make all database Ids 64 bit. -- Allow console commands to detach or run in background. -- Add SD message variables to control operator wait time - - Maximum Operator Wait - - Minimum Message Interval - - Maximum Message Interval -- Send Operator message when cannot read tape label. -- Verify level=Volume (scan only), level=Data (compare of data to file). - Verify level=Catalog, level=InitCatalog -- Events file -- Add keyword search to show command in Console. -- Events : tape has more than xxx bytes. -- Complete code in Bacula Resources -- this will permit - reading a new config file at any time. -- Handle ctl-c in Console -- Implement script driven addition of File daemon to config files. -- Think about how to make Bacula work better with File (non-tape) archives. -- Write Unix emulator for Windows. -- Make database type selectable by .conf files i.e. at runtime -- Set flag for uname -a. Add to Volume label. -- Restore files modified after date -- SET LD_RUN_PATH=$HOME/mysql/lib/mysql -- Remove duplicate fields from jcr (e.g. jcr.level and jcr.jr.Level, ...). -- Timout a job or terminate if link goes down, or reopen link and query. -- Concept of precious tapes (cannot be reused). -- Make bcopy copy with a single tape drive. -- Permit changing ownership during restore. - -- From Phil: - > My suggestion: Add a feature on the systray menu-icon menu to request - > an immediate backup now. This would be useful for laptop users who may - > not be on the network when the regular scheduled backup is run. - > - > My wife's suggestion: Add a setting to the win32 client to allow it to - > shut down the machine after backup is complete (after, of course, - > displaying a "System will shut down in one minute, click here to cancel" - > warning dialog). This would be useful for sites that want user - > woorkstations to be shut down overnight to save power. - > - -- Autolabel should be specified by DIR instead of SD. -- Storage daemon - - Add media capacity - - AutoScan (check checksum of tape) - - Format command = "format /dev/nst0" - - MaxRewindTime - - MinRewindTime - - MaxBufferSize - - Seek resolution (usually corresponds to buffer size) - - EODErrorCode=ENOSPC or code - - Partial Read error code - - Partial write error code - - Nonformatted read error - - Nonformatted write error - - WriteProtected error - - IOTimeout - - OpenRetries - - OpenTimeout - - IgnoreCloseErrors=yes - - Tape=yes - - NoRewind=yes -- Pool - - Maxwrites - - Recycle period -- Job - - MaxWarnings - - MaxErrors (job?) -===== -- Write a Storage daemon that uses pipes and - standard Unix programs to write to the tape. - See afbackup. -- Need something that monitors the JCR queue and - times out jobs by asking the deamons where they are. -- Verify from Volume -- Need report class for messages. Perhaps - report resource where report=group of messages -- enhance scan_attrib and rename scan_jobtype, and - fill in code for "since" option -- Director needs a time after which the report status is sent - anyway -- or better yet, a retry time for the job. -- Don't reschedule a job if previous incarnation is still running. -- Some way to automatically backup everything is needed???? -- Need a structure for pending actions: - - buffered messages - - termination status (part of buffered msgs?) -- Drive management - Read, Write, Clean, Delete -- Login to Bacula; Bacula users with different permissions: - owner, group, user, quotas -- Store info on each file system type (probably in the job header on tape. - This could be the output of df; or perhaps some sort of /etc/mtab record. - -========= ideas =============== -From: "Jerry K. Schieffer" -To: -Subject: RE: [Bacula-users] future large programming jobs -Date: Thu, 26 Feb 2004 11:34:54 -0600 - -I noticed the subject thread and thought I would offer the following -merely as sources of ideas, i.e. something to think about, not even as -strong as a request. In my former life (before retiring) I often -dealt with backups and storage management issues/products as a -developer and as a consultant. I am currently migrating my personal -network from amanda to bacula specifically because of the ability to -cross media boundaries during storing backups. -Are you familiar with the commercial product called ADSM (I think IBM -now sells it under the Tivoli label)? It has a couple of interesting -ideas that may apply to the following topics. - -1. Migration: Consider that when you need to restore a system, there -may be pressure to hurry. If all the information for a single client -can eventually end up on the same media (and in chronological order), -the restore is facillitated by not having to search past information -from other clients. ADSM has the concept of "client affinity" that -may be associated with it's storage pools. It seems to me that this -concept (as an optional feature) might fit in your architecture for -migration. - -ADSM also has the concept of defining one or more storage pools as -"copy pools" (almost mirrors, but only in the sense of contents). -These pools provide the ability to have duplicte data stored both -onsite and offsite. The copy process can be scheduled to be handled -by their storage manager during periods when there is no backup -activity. Again, the migration process might be a place to consider -implementing something like this. - -> -> It strikes me that it would be very nice to be able to do things -like -> have the Job(s) backing up the machines run, and once they have all -> completed, start a migration job to copy the data from disks Volumes -to -> a tape library and then to offsite storage. Maybe this can already -be -> done with some careful scheduling and Job prioritzation; the events -> mechanism described below would probably make it very easy. - -This is the goal. In the first step (before events), you simply -schedule -the Migration to tape later. - -2. Base jobs: In ADSM, each copy of each stored file is tracked in -the database. Once a file (unique by path and metadata such as dates, -size, ownership, etc.) is in a copy pool, no more copies are made. In -other words, when you start ADSM, it begins like your concept of a -base job. After that it is in the "incremental" mode. You can -configure the number of "generations" of files to be retained, plus a -retention date after which even old generations are purged. The -database tracks the contents of media and projects the percentage of -each volume that is valid. When the valid content of a volume drops -below a configured percentage, the valid data are migrated to another -volume and the old volume is marked as empty. Note, this requires -ADSM to have an idea of the contents of a client, i.e. marking the -database when an existing file was deleted, but this would solve your -issue of restoring a client without restoring deleted files. - -This is pretty far from what bacula now does, but if you are going to -rip things up for Base jobs,..... -Also, the benefits of this are huge for very large shops, especially -with media robots, but are a pain for shops with manual media -mounting. - -Regards, -Jerry Schieffer - -============================== - -Longer term to do: -- Audit M_ error codes to ensure they are correct and consistent. -- Add variable break characters to lex analyzer. - Either a bit mask or a string of chars so that - the caller can change the break characters. -- Make a single T_BREAK to replace T_COMMA, etc. -- Ensure that File daemon and Storage daemon can - continue a save if the Director goes down (this - is NOT currently the case). Must detect socket error, - buffer messages for later. -- Add ability to backup to two Storage devices (two SD sessions) at - the same time -- e.g. onsite, offsite. - -====================================================== - -==== - Handling removable disks - - From: Karl Cunningham - - My backups are only to hard disk these days, in removable bays. This is my - idea of how a backup to hard disk would work more smoothly. Some of these - things Bacula does already, but I mention them for completeness. If others - have better ways to do this, I'd like to hear about it. - - 1. Accommodate several disks, rotated similar to how tapes are. Identified - by partition volume ID or perhaps by the name of a subdirectory. - 2. Abort & notify the admin if the wrong disk is in the bay. - 3. Write backups to different subdirectories for each machine to be backed - up. - 4. Volumes (files) get created as needed in the proper subdirectory, one - for each backup. - 5. When a disk is recycled, remove or zero all old backup files. This is - important as the disk being recycled may be close to full. This may be - better done manually since the backup files for many machines may be - scattered in many subdirectories. -==== - - -=== Done - -=== - Base Jobs design -It is somewhat like a Full save becomes an incremental since -the Base job (or jobs) plus other non-base files. -Need: -- A Base backup is same as Full backup, just different type. -- New BaseFiles table that contains: - BaseId - index - BaseJobId - Base JobId referenced for this FileId (needed ???) - JobId - JobId currently running - FileId - File not backed up, exists in Base Job - FileIndex - FileIndex from Base Job. - i.e. for each base file that exists but is not saved because - it has not changed, the File daemon sends the JobId, BaseId, - FileId, FileIndex back to the Director who creates the DB entry. -- To initiate a Base save, the Director sends the FD - the FileId, and full filename for each file in the Base. -- When the FD finds a Base file, he requests the Director to - send him the full File entry (stat packet plus MD5), or - conversely, the FD sends it to the Director and the Director - says yes or no. This can be quite rapid if the FileId is kept - by the FD for each Base Filename. -- It is probably better to have the comparison done by the FD - despite the fact that the File entry must be sent across the - network. -- An alternative would be to send the FD the whole File entry - from the start. The disadvantage is that it requires a lot of - space. The advantage is that it requires less communications - during the save. -- The Job record must be updated to indicate that one or more - Bases were used. -- At end of Job, FD returns: - 1. Count of base files/bytes not written to tape (i.e. matches) - 2. Count of base file that were saved i.e. had changed. -- No tape record would be written for a Base file that matches, in the - same way that no tape record is written for Incremental jobs where - the file is not saved because it is unchanged. -- On a restore, all the Base file records must explicitly be - found from the BaseFile tape. I.e. for each Full save that is marked - to have one or more Base Jobs, search the BaseFile for all occurrences - of JobId. -- An optimization might be to make the BaseFile have: - JobId - BaseId - FileId - plus - FileIndex - This would avoid the need to explicitly fetch each File record for - the Base job. The Base Job record will be fetched to get the - VolSessionId and VolSessionTime. -- Fix bpipe.c so that it does not modify results pointer. - ***FIXME*** calling sequence should be changed. -- Fix restore of acls and extended attributes to count ERROR - messages and make errors non-fatal. -- Put save/restore various platform acl/xattrs on a pointer to simplify - the code. -- Add blast attributes to DIR to SD. -- Implement unmount of USB volumes. -- Look into using Dart for testing - http://public.kitware.com/Dart/HTML/Index.shtml -- 2.39.5