X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fkernstodo;h=2d2f3d7a063310a28b6e364a362d95dbce510921;hb=d9b7c3a08c08c031a44519f497dcce964f219234;hp=df9fc31983fb1655aa6d1cc7549258f88d9d8153;hpb=fc92af306f2cd876b24b4858a15d05b18f525b6e;p=bacula%2Fbacula diff --git a/bacula/kernstodo b/bacula/kernstodo index df9fc31983..2d2f3d7a06 100644 --- a/bacula/kernstodo +++ b/bacula/kernstodo @@ -1,37 +1,532 @@ Kern's ToDo List - 08 February 2006 + 21 June 2009 + +Rescue: +Add to USB key: + gftp sshfs kile kate lsssci m4 mtx nfs-common nfs-server + patch squashfs-tools strace sg3-utils screen scsiadd + system-tools-backend telnet dpkg traceroute urar usbutils + whois apt-file autofs busybox chkrootkit clamav dmidecode + manpages-dev manpages-posix manpages-posix-dev -Major development: -Project Developer -======= ========= Document: -- Does ClientRunAfterJob fail the job on a bad return code? +- package sg3-utils, program sg_map +- !!! Cannot restore two jobs a the same time that were + written simultaneously unless they were totally spooled. - Document cleaning up the spool files: db, pid, state, bsr, mail, conmsg, spool - Document the multiple-drive-changer.txt script. - Pruning with Admin job. - Does WildFile match against full name? Doc. - %d and %v only valid on Director, not for ClientRunBefore/After. +- During tests with the 260 char fix code, I found one problem: + if the system "sees" a long path once, it seems to forget it's + working drive (e.g. c:\), which will lead to a problem during + the next job (create bootstrap file will fail). Here is the + workaround: specify absolute working and pid directory in + bacula-fd.conf (e.g. c:\bacula\working instead of + \bacula\working). +- Document techniques for restoring large numbers of files. +- Document setting my.cnf to big file usage. +- Add example of proper index output to doc. show index from File; +- Correct the Include syntax in the m4.xxx files in examples/conf +- Document JobStatus and Termination codes. +- Fix the error with the "DVI file can't be opened" while + building the French PDF. +- Document more DVD stuff +- Doc + { "JobErrors", "i"}, + { "JobFiles", "i"}, + { "SDJobFiles", "i"}, + { "SDErrors", "i"}, + { "FDJobStatus","s"}, + { "SDJobStatus","s"}, +- Document all the little details of setting up certificates for + the Bacula data encryption code. +- Document more precisely how to use master keys -- especially + for disaster recovery. + +Professional Needs: +- Migration from other vendors + - Date change + - Path change +- Filesystem types +- Backup conf/exe (all daemons) +- Backup up system state +- Detect state change of system (verify) +- Synthetic Full, Diff, Inc (Virtual, Reconstructed) +- SD to SD +- Modules for Databases, Exchange, ... +- Novell NSS backup http://www.novell.com/coolsolutions/tools/18952.html +- Compliance norms that compare restored code hash code. +- When glibc crash, get address with + info symbol 0x809780c +- How to sync remote offices. +- Exchange backup: + http://www.microsoft.com/technet/itshowcase/content/exchbkup.mspx +- David's priorities + Copypools + Extract capability (#25) + Continued enhancement of bweb + Threshold triggered migration jobs (not currently in list, but will be + needed ASAP) + Client triggered backups + Complete rework of the scheduling system (not in list) + Performance and usage instrumentation (not in list) + See email of 21Aug2007 for details. +- Look at: http://tech.groups.yahoo.com/group/cfg2html + and http://www.openeyet.nl/scc/ for managing customer changes Priority: -- Implement code that makes the Dir aware that a drive is an - autochanger (so the user doesn't need to use the Autochanger = yes - directive). - -For 1.39: -- Detect resource deadlock in Migrate when same job wants to read - and write the same device. -- Make hardlink code at line 240 of find_one.c use binary search. -- Queue warning/error messages during restore so that they - are reported at the end of the report rather than being - hidden in the file listing ... -- Fix Maximum Changer Wait (and others) to accept qualifiers. +================ + +- Fix restore of acls and extended attributes to count ERROR + messages and make errors non-fatal. +- Put save/restore various platform acl/xattrs on a pointer to simplify + the code. + + +- Why no error message if restore has no permission on the where + directory? +- Possibly allow manual "purge" to purge a Volume that has not + yet been written (even if FirstWritten time is zero) see ua_purge.c + is_volume_purged(). +- Add disk block detection bsr code (make it work). +- Remove done bsrs. +- Add blast attributes to DIR to SD. +- Detect deadlocks in reservations. +- Plugins: + - Add list during dump + - Add in plugin code flag + - Add bRC_EndJob -- stops more calls to plugin this job + - Add bRC_Term (unload plugin) + - remove time_t from Jmsg and use utime_t? +- Deadlock detection, watchdog sees if counter advances when jobs are + running. With debug on, can do a "status" command. +- User options for plugins. +- Pool Storage override precedence over command line. +- Autolabel only if Volume catalog information indicates tape not + written. This will avoid overwriting a tape that gets an I/O + error on reading the volume label. +- I/O error, SD thinks it is not the right Volume, should check slot + then disable volume, but Asks for mount. +- Can be posible modify package to create and use configuration files in + the Debian manner? + + For example: + + /etc/bacula/bacula-dir.conf + /etc/bacula/conf.d/pools.conf + /etc/bacula/conf.d/clients.conf + /etc/bacula/conf.d/storages.conf + + and into bacula-dir.conf file include + + @/etc/bacula/conf.d/pools.conf + @/etc/bacula/conf.d/clients.conf + @/etc/bacula/conf.d/storages.conf +- Possibly add an Inconsistent state when a Volume is in error + for non I/O reasons. +- Fix #ifdefing so that smartalloc can be disabled. Check manual + -- the default is enabled. +- Change calling sequence to delete_job_id_range() in ua_cmds.c + the preceding strtok() is done inside the subroutine only once. +- Dangling softlinks are not restored properly. For example, take a + soft link such as src/testprogs/install-sh, which points to /usr/share/autoconf... + move the directory to another machine where the file /usr/share/autoconf does + not exist, back it up, then try a full restore. It fails. +- Softlinks that point to non-existent file are not restored in restore all, + but are restored if the file is individually selected. BUG! +- New directive "Delete purged Volumes" +- Prune by Job +- Prune by Job Level (Full, Differential, Incremental) +- Strict automatic pruning +- Implement unmount of USB volumes. +- Use "./config no-idea no-mdc2 no-rc5" on building OpenSSL for + Win32 to avoid patent problems. +- Implement multiple jobid specification for the cancel command, + similar to what is permitted on the update slots command. +- modify pruning to keep a fixed number of versions of a file, + if requested. +- the cd-command should allow complete paths + i.e. cd /foo/bar/foo/bar + -> if a customer mails me the path to a certain file, + its faster to enter the specified directory +- Make tree walk routines like cd, ls, ... more user friendly + by handling spaces better. +- When doing a restore, if the user does an "update slots" + after the job started in order to add a restore volume, the + values prior to the update slots will be put into the catalog. + Must retrieve catalog record merge it then write it back at the + end of the restore job, if we want to do this right. +=== rate design + jcr->last_rate + jcr->last_runtime + MA = (last_MA * 3 + rate) / 4 + rate = (bytes - last_bytes) / (runtime - last_runtime) +- Add a recursive mark command (rmark) to restore. +- "Minimum Job Interval = nnn" sets minimum interval between Jobs + of the same level and does not permit multiple simultaneous + running of that Job (i.e. lets any previous invocation finish + before doing Interval testing). +- Look at simplifying File exclusions. +- Scripts +- Auto update of slot: + rufus-dir: ua_run.c:456-10 JobId=10 NewJobId=10 using pool Full priority=10 + 02-Nov 12:58 rufus-dir JobId 10: Start Backup JobId 10, Job=kernsave.2007-11-02_12.58.03 + 02-Nov 12:58 rufus-dir JobId 10: Using Device "DDS-4" + 02-Nov 12:58 rufus-sd JobId 10: Invalid slot=0 defined in catalog for Volume "Vol001" on "DDS-4" (/dev/nst0). Manual load my be required. + 02-Nov 12:58 rufus-sd JobId 10: 3301 Issuing autochanger "loaded? drive 0" command. + 02-Nov 12:58 rufus-sd JobId 10: 3302 Autochanger "loaded? drive 0", result is Slot 2. + 02-Nov 12:58 rufus-sd JobId 10: Wrote label to prelabeled Volume "Vol001" on device "DDS-4" (/dev/nst0) + 02-Nov 12:58 rufus-sd JobId 10: Alert: TapeAlert[7]: Media Life: The tape has reached the end of its useful life. + 02-Nov 12:58 rufus-dir JobId 10: Bacula rufus-dir 2.3.6 (26Oct07): 02-Nov-2007 12:58:51 +- Separate Files and Directories in catalog +- Create FileVersions table +- Look at rsysnc for incremental updates and dedupping +- Add MD5 or SHA1 check in SD for data validation +- finish implementation of fdcalled -- see ua_run.c:105 +- Fix problem in postgresql.c in my_postgresql_query, where the + generation of the error message doesn't differentiate result==NULL + and a bad status from that result. Not only that, the result is + cleared on a bail_out without having generated the error message. +- KIWI +- Implement SDErrors (must return from SD) +- Implement USB keyboard support in rescue CD. +- Implement continue spooling while despooling. +- Remove all install temp files in Win32 PLUGINSDIR. +- Audit retention periods to make sure everything is 64 bit. +- No where in restore causes kaboom. +- Performance: multiple spool files for a single job. +- Performance: despool attributes when despooling data (problem + multiplexing Dir connection). +- Make restore use the in-use volume reservation algorithm. +- When Pool specifies Storage command override does not work. +- Implement wait_for_sysop() message display in wait_for_device(), which + now prints warnings too often. +- Ensure that each device in an Autochanger has a different + Device Index. +- Look at sg_logs -a /dev/sg0 for getting soft errors. +- btape "test" command with Offline on Unmount = yes + + This test is essential to Bacula. + + I'm going to write one record in file 0, + two records in file 1, + and three records in file 2 + + 02-Feb 11:00 btape: ABORTING due to ERROR in dev.c:715 + dev.c:714 Bad call to rewind. Device "LTO" (/dev/nst0) not open + 02-Feb 11:00 btape: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation + Kaboom! btape, btape got signal 11. Attempting traceback. + +- Encryption -- email from Landon + > The backup encryption algorithm is currently not configurable, and is + > set to AES_128_CBC in src/filed/backup.c. The encryption code + > supports a number of different ciphers (as well as adding arbitrary + > new ones) -- only a small bit of code would be required to map a + > configuration string value to a CRYPTO_CIPHER_* value, if anyone is + > interested in implementing this functionality. + +- Figure out some way to "automatically" backup conf changes. +- Add the OS version back to the Win32 client info. +- Restarted jobs have a NULL in the from field. +- Modify SD status command to indicate when the SD is writing + to a DVD (the device is not open -- see bug #732). +- Look at the possibility of adding "SET NAMES UTF8" for MySQL, + and possibly changing the blobs into varchar. +- Ensure that the SD re-reads the Media record if the JobFiles + does not match -- it may have been updated by another job. +- Doc items +- Test Volume compatibility between machine architectures +- Encryption documentation +- Wrong jobbytes with query 12 (todo) +- Bare-metal recovery Windows (todo) + + +Projects: +- Pool enhancements + - Access Mode = Read-Only, Read-Write, Unavailable, Destroyed, Offsite + - Pool Type = Copy + - Maximum number of scratch volumes + - Maximum File size + - Next Pool (already have) + - Reclamation threshold + - Reclamation Pool + - Reuse delay (after all files purged from volume before it can be used) + - Copy Pool = xx, yyy (or multiple lines). + - Catalog = xxx + - Allow pool selection during restore. + +- Average tape size from Eric + SELECT COALESCE(media_avg_size.volavg,0) * count(Media.MediaId) AS volmax, GROUP BY Media.MediaType, Media.PoolId, media_avg_size.volavg + count(Media.MediaId) AS volnum, + sum(Media.VolBytes) AS voltotal, + Media.PoolId AS PoolId, + Media.MediaType AS MediaType + FROM Media + LEFT JOIN (SELECT avg(Media.VolBytes) AS volavg, + Media.MediaType AS MediaType + FROM Media + WHERE Media.VolStatus = 'Full' + GROUP BY Media.MediaType + ) AS media_avg_size ON (Media.MediaType = media_avg_size.MediaType) + GROUP BY Media.MediaType, Media.PoolId, media_avg_size.volavg +- GUI + - Admin + - Management reports + - Add doc for bweb -- especially Installation + - Look at Webmin + http://www.orangecrate.com/modules.php?name=News&file=article&sid=501 +- Performance + - Despool attributes in separate thread + - Database speedups + - Embedded MySQL + - Check why restore repeatedly sends Rechdrs between + each data chunk -- according to James Harper 9Jan07. +- Features + - Better scheduling + - More intelligent re-run + - FD plugins + - Incremental backup -- rsync, Stow + +For next release: +- Try to fix bscan not working with multiple DVD volumes bug #912. +- Look at mondo/mindi +- Make Bacula by default not backup tmpfs, procfs, sysfs, ... +- Fix hardlinked immutable files when linking a second file, the + immutable flag must be removed prior to trying to link it. +- Change dbcheck to tell users to use native tools for fixing + broken databases, and to ensure they have the proper indexes. +- add udev rules for Bacula devices. +- If a job terminates, the DIR connection can close before the + Volume info is updated, leaving the File count wrong. +- Look at why SIGPIPE during connection can cause seg fault in + writing the daemon message, when Dir dropped to bacula:bacula +- Look at zlib 32 => 64 problems. +- Possibly turn on St. Bernard code. +- Fix bextract to restore ACLs, or better yet, use common routines. +- Do we migrate appendable Volumes? +- Remove queue.c code. +- Print warning message if LANG environment variable does not specify + UTF-8. +- New dot commands from Arno. + .show device=xxx lists information from one storage device, including + devices (I'm not even sure that information exists in the DIR...) + .move eject device=xxx mostly the same as 'unmount xxx' but perhaps with + better machine-readable output like "Ok" or "Error busy" + .move eject device=xxx toslot=yyy the same as above, but with a new + target slot. The catalog should be updated accordingly. + .move transfer device=xxx fromslot=yyy toslot=zzz + +Low priority: +- Article: http://www.heise.de/open/news/meldung/83231 +- Article: http://www.golem.de/0701/49756.html +- Article: http://lwn.net/Articles/209809/ +- Article: http://www.onlamp.com/pub/a/onlamp/2004/01/09/bacula.html +- Article: http://www.linuxdevcenter.com/pub/a/linux/2005/04/07/bacula.html +- Article: http://www.osreviews.net/reviews/admin/bacula +- Article: http://www.debianhelp.co.uk/baculaweb.htm +- Article: +- Wikis mentioning Bacula + http://wiki.finkproject.org/index.php/Admin:Backups + http://wiki.linuxquestions.org/wiki/Bacula + http://www.openpkg.org/product/packages/?package=bacula + http://www.iterating.com/products/Bacula + http://net-snmp.sourceforge.net/wiki/index.php/Net-snmp_extensions + http://www.section6.net/wiki/index.php/Using_Bacula_for_Tape_Backups + http://bacula.darwinports.com/ + http://wiki.mandriva.com/en/Releases/Corporate/Server_4/Notes#Bacula + http://en.wikipedia.org/wiki/Bacula + +- Bacula Wikis + http://www.devco.net/pubwiki/Bacula/ + http://paramount.ind.wpi.edu/wiki/doku.php + http://gentoo-wiki.com/HOWTO_Backup + http://www.georglutz.de/wiki/Bacula + http://www.clarkconnect.com/wiki/index.php?title=Modules_-_LAN_Backup/Recovery + http://linuxwiki.de/Bacula (in German) + +- Possibly allow SD to spool even if a tape is not mounted. +- Figure out how to configure query.sql. Suggestion to use m4: + == changequote.m4 === + changequote(`[',`]')dnl + ==== query.sql.in === + :List next 20 volumes to expire + SELECT + Pool.Name AS PoolName, + Media.VolumeName, + Media.VolStatus, + Media.MediaType, + ifdef([MySQL], + [ FROM_UNIXTIME(UNIX_TIMESTAMP(Media.LastWritten) Media.VolRetention) AS Expire, ])dnl + ifdef([PostgreSQL], + [ media.lastwritten + interval '1 second' * media.volretention as expire, ])dnl + Media.LastWritten + FROM Pool + LEFT JOIN Media + ON Media.PoolId=Pool.PoolId + WHERE Media.LastWritten>0 + ORDER BY Expire + LIMIT 20; + ==== + Command: m4 -DmySQL changequote.m4 query.sql.in >query.sql + + The problem is that it requires m4, which is not present on all machines + at ./configure time. + +==== SQL +# get null file +select FilenameId from Filename where Name=''; +# Get list of all directories referenced in a Backup. +select Path.Path from Path,File where File.JobId=nnn and + File.FilenameId=(FilenameId-from-above) and File.PathId=Path.PathId + order by Path.Path ASC; + +- Look into using Dart for testing + http://public.kitware.com/Dart/HTML/Index.shtml + +- Look into replacing autotools with cmake + http://www.cmake.org/HTML/Index.html + +- Mount on an Autochanger with no tape in the drive causes: + Automatically selected Storage: LTO-changer + Enter autochanger drive[0]: 0 + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because: + Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found. + 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. + If this is not a blank tape, try unmounting and remounting the Volume. +- If Drive 0 is blocked, and drive 1 is set "Autoselect=no", drive 1 will + be used. +- Autochanger did not change volumes. + select * from Storage; + +-----------+-------------+-------------+ + | StorageId | Name | AutoChanger | + +-----------+-------------+-------------+ + | 1 | LTO-changer | 0 | + +-----------+-------------+-------------+ + 05-May 03:50 roxie-sd: 3302 Autochanger "loaded drive 0", result is Slot 11. + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Warning: Director wanted Volume "LT + Current Volume "LT0-002" not acceptable because: + 1997 Volume "LT0-002" not in catalog. + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Error: Autochanger Volume "LT0-002" + Setting InChanger to zero in catalog. + 05-May 03:50 roxie-dir: Tibs.2006-05-05_03.05.02 Error: Unable to get Media record + + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Error getting Volume i + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: Job 530 canceled. + 05-May 03:50 roxie-sd: Tibs.2006-05-05_03.05.02 Fatal error: spool.c:249 Fatal appe + 05-May 03:49 Tibs: Tibs.2006-05-05_03.05.02 Fatal error: c:\cygwin\home\kern\bacula + , got + (missing) + llist volume=LTO-002 + MediaId: 6 + VolumeName: LTO-002 + Slot: 0 + PoolId: 1 + MediaType: LTO-2 + FirstWritten: 2006-05-05 03:11:54 + LastWritten: 2006-05-05 03:50:23 + LabelDate: 2005-12-26 16:52:40 + VolJobs: 1 + VolFiles: 0 + VolBlocks: 1 + VolMounts: 0 + VolBytes: 206 + VolErrors: 0 + VolWrites: 0 + VolCapacityBytes: 0 + VolStatus: + Recycle: 1 + VolRetention: 31,536,000 + VolUseDuration: 0 + MaxVolJobs: 0 + MaxVolFiles: 0 + MaxVolBytes: 0 + InChanger: 0 + EndFile: 0 + EndBlock: 0 + VolParts: 0 + LabelType: 0 + StorageId: 1 + + Note VolStatus is blank!!!!! + llist volume=LTO-003 + MediaId: 7 + VolumeName: LTO-003 + Slot: 12 + PoolId: 1 + MediaType: LTO-2 + FirstWritten: 0000-00-00 00:00:00 + LastWritten: 0000-00-00 00:00:00 + LabelDate: 2005-12-26 16:52:40 + VolJobs: 0 + VolFiles: 0 + VolBlocks: 0 + VolMounts: 0 + VolBytes: 1 + VolErrors: 0 + VolWrites: 0 + VolCapacityBytes: 0 + VolStatus: Append + Recycle: 1 + VolRetention: 31,536,000 + VolUseDuration: 0 + MaxVolJobs: 0 + MaxVolFiles: 0 + MaxVolBytes: 0 + InChanger: 0 + EndFile: 0 + EndBlock: 0 + VolParts: 0 + LabelType: 0 + StorageId: 1 +=== + mount + Automatically selected Storage: LTO-changer + Enter autochanger drive[0]: 0 + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3301 Issuing autochanger "loaded drive 0" command. + 3302 Autochanger "loaded drive 0", result: nothing loaded. + 3902 Cannot mount Volume on Storage Device "LTO-Drive1" (/dev/nst0) because: + Couldn't rewind device "LTO-Drive1" (/dev/nst0): ERR=dev.c:678 Rewind error on "LTO-Drive1" (/dev/nst0). ERR=No medium found. + + 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. + If this is not a blank tape, try unmounting and remounting the Volume. + +- http://www.dwheeler.com/essays/commercial-floss.html +- Add VolumeLock to prevent all but lock holder (SD) from updating + the Volume data (with the exception of VolumeState). +- The btape fill command does not seem to use the Autochanger +- Make Windows installer default to system disk drive. +- Look at using ioctl(FIOBMAP, ...) on Linux, and + DeviceIoControl(..., FSCTL_QUERY_ALLOCATED_RANGES, ...) on + Win32 for sparse files. + http://www.flexhex.com/docs/articles/sparse-files.phtml + http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html +- Directive: at "command" +- Command: pycmd "command" generates "command" event. How to + attach to a specific job? +- Integrate Christopher's St. Bernard code. +- run_cmd() returns int should return JobId_t +- get_next_jobid_from_list() returns int should return JobId_t +- Document export LDFLAGS=-L/usr/lib64 +- Don't attempt to restore from "Disabled" Volumes. +- Network error on Win32 should set Win32 error code. +- What happens when you rename a Disk Volume? +- Job retention period in a Pool (and hence Volume). The job would + then be migrated. - Look at -D_FORTIFY_SOURCE=2 - Add Win32 FileSet definition somewhere - Look at fixing restore status stats in SD. -- Make selection of Database used in restore correspond to - client. +- Look at using ioctl(FIMAP) and FIGETBSZ for sparse files. + http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html - Implement a mode that says when a hard read error is encountered, read many times (as it currently does), and if the block cannot be read, skip to the next block, and try again. If @@ -42,23 +537,13 @@ For 1.39: ("F","Full"), ("D","Diff"), ("I","Inc"); -- Add ACL to restore only to original location. -- Add a recursive mark command (rmark) to restore. -- "Minimum Job Interval = nnn" sets minimum interval between Jobs - of the same level and does not permit multiple simultaneous - running of that Job (i.e. lets any previous invocation finish - before doing Interval testing). -- Look at simplifying File exclusions. -- Fix store_yesno to be store_bitmask. -- New directive "Delete purged Volumes" +- Show files/second in client status output. - new pool XXX with ScratchPoolId = MyScratchPool's PoolId and let it fill itself, and RecyclePoolId = XXX's PoolId so I can see if it become stable and I just have to supervise MyScratchPool - If I want to remove this pool, I set RecyclePoolId = MyScratchPool's PoolId, and when it is empty remove it. -- Figure out how to recycle Scratch volumes back to the Scratch - Pool. - Add Volume=SCRTCH - Allow Check Labels to be used with Bacula labels. - "Resuming" a failed backup (lost line for example) by using the @@ -68,17 +553,6 @@ For 1.39: days before it needs changing. - Command to show next tape that will be used for a job even if the job is not scheduled. ---- create_file.c.orig Fri Jul 8 12:13:05 2005 -+++ create_file.c Fri Jul 8 12:13:07 2005 -@@ -195,6 +195,8 @@ - attr->ofname, be.strerror()); - return CF_ERROR; - } -+ } else if(S_ISSOCK(attr->statp.st_mode)) { -+ Dmsg1(200, "Skipping socket: %s\n", attr->ofname); - } else { - Dmsg1(200, "Restore node: %s\n", attr->ofname); - if (mknod(attr->ofname, attr->statp.st_mode, attr->statp.st_rdev) != 0 && errno != EEXIST) { - From: Arunav Mandal 1. When jobs are running and bacula for some reason crashes or if I do a restart it remembers and jobs it was running before it crashed or restarted @@ -101,14 +575,6 @@ For 1.39: backups of the same client and if we again try to start a full backup of client backup abc bacula won't complain. That should be fixed. -- Fix bpipe.c so that it does not modify results pointer. - ***FIXME*** calling sequence should be changed. -1.xx Major Projects: -#3 Migration (Move, Copy, Archive Jobs) -#7 Single Job Writing to Multiple Storage Devices -- Reserve blocks other restore jobs when first cannot connect - to SD. -- Add true/false to conf same as yes/no - For Windows disaster recovery see http://unattended.sf.net/ - regardless of the retention period, Bacula will not prune the last Full, Diff, or Inc File data until a month after the @@ -138,15 +604,8 @@ For 1.39: - In restore don't compare byte count on a raw device -- directory entry does not contain bytes. -- To mark files as deleted, run essentially a Verify to disk, and - when a file is found missing (MarkId != JobId), then create - a new File record with FileIndex == -1. This could be done - by the FD at the same time as the backup. -=== rate design - jcr->last_rate - jcr->last_runtime - MA = (last_MA * 3 + rate) / 4 - rate = (bytes - last_bytes) / (runtime - last_runtime) + + - Max Vols limit in Pool off by one? - Implement Files/Bytes,... stats for restore job. - Implement Total Bytes Written, ... for restore job. @@ -190,58 +649,50 @@ For 1.39: - Bug: if a job is manually scheduled to run later, it does not appear in any status report and cannot be cancelled. -==== Keeping track of deleted files ==== - My "trick" for keeping track of deletions is the following. - Assuming the user turns on this option, after all the files - have been backed up, but before the job has terminated, the - FD will make a pass through all the files and send their - names to the DIR (*exactly* the same as what a Verify job - currently does). This will probably be done at the same - time the files are being sent to the SD avoiding a second - pass. The DIR will then compare that to what is stored in - the catalog. Any files in the catalog but not in what the - FD sent will receive a catalog File entry that indicates - that at that point in time the file was deleted. - - During a restore, any file initially picked up by some - backup (Full, ...) then subsequently having a File entry - marked "delete" will be removed from the tree, so will not - be restored. If a file with the same name is later OK it - will be inserted in the tree -- this already happens. All - will be consistent except for possible changes during the - running of the FD. - - Since I'm on the subject, some of you may be wondering what - the utility of the in memory tree is if you are going to - restore everything (at least it comes up from time to time - on the list). Well, it is still *very* useful because it - allows only the last item found for a particular filename - (full path) to be entered into the tree, and thus if a file - is backed up 10 times, only the last copy will be restored. - I recently (last Friday) restored a complete directory, and - the Full and all the Differential and Incremental backups - spanned 3 Volumes. The first Volume was not even mounted - because all the files had been updated and hence backed up - since the Full backup was made. In this case, the tree - saved me a *lot* of time. - - Make sure this information is stored on the tape too so - that it can be restored directly from the tape. - - Comments from Martin Simmons (I think they are all covered): - Ok, that should cover the basics. There are few issues though: - - - Restore will depend on the catalog. I think it is better to include the - extra data in the backup as well, so it can be seen by bscan and bextract. - - - I'm not sure if it will preserve multiple hard links to the same inode. Or - maybe adding or removing links will cause the data to be dumped again? - - - I'm not sure if it will handle renamed directories. Possibly it will work - by dumping the whole tree under a renamed directory? - - - It remains to be seen how the backup performance of the DIR's will be - affected when comparing the catalog for a large filesystem. + +==== +From David: +How about introducing a Type = MgmtPolicy job type? That job type would +be responsible for scanning the Bacula environment looking for specific +conditions, and submitting the appropriate jobs for implementing said +policy, eg: + +Job { + Name = "Migration-Policy" + Type = MgmtPolicy + Policy Selection Job Type = Migrate + Scope = " " + Threshold = " " + Job Template = +} + +Where is any legal job keyword, is a comparison +operator (=,<,>,!=, logical operators AND/OR/NOT) and is a +appropriate regexp. I could see an argument for Scope and Threshold +being SQL queries if we want to support full flexibility. The +Migration-Policy job would then get scheduled as frequently as a site +felt necessary (suggested default: every 15 minutes). + +Example: + +Job { + Name = "Migration-Policy" + Type = MgmtPolicy + Policy Selection Job Type = Migration + Scope = "Pool=*" + Threshold = "Migration Selection Type = LowestUtil" + Job Template = "MigrationTemplate" +} + +would select all pools for examination and generate a job based on +MigrationTemplate to automatically select the volume with the lowest +usage and migrate it's contents to the nextpool defined for that pool. + +This policy abstraction would be really handy for adjusting the behavior +of Bacula according to site-selectable criteria (one thing that pops +into mind is Amanda's ability to automatically adjust backup levels +depending on various criteria). + ===== @@ -357,23 +808,8 @@ Why: format string. Then I have the tape labeled automatically with weekday name in the correct language. ========== -- Yes, that is surely the case. I probably should turn those into Warning - errors. In addition, you just made me think that it might not be bad to - add an option to check the file size after backing up the file and - report if it changes. This would be done as an option because it would - add extra overhead. - - Kern, good idea. If you do do that, mention in the output: file - shrunk, or file expanded, just to make it obvious to the user - (without having to the refer to file size), just how the file size - changed. - - Would this option be for all file, or just one file? Or a fileset? - Make output from status use html table tags for nicely presenting in a browser. -- Can one write tapes faster with 8192 byte block sizes? -- Document security problems with the same password for everyone in - rpm and Win32 releases. - Browse generations of files. - I've seen an error when my catalog's File table fills up. I then have to recreate the File table with a larger maximum row @@ -453,16 +889,12 @@ Documentation to do: (any release a little bit at a time) - Use gather write() for network I/O. - Autorestart on crash. - Add bandwidth limiting. -- Add acks every once and a while from the SD to keep - the line from timing out. - When an error in input occurs and conio beeps, you can back up through the prompt. - Detect fixed tape block mode during positioning by looking at block numbers in btape "test". Possibly adjust in Bacula. - Fix list volumes to output volume retention in some other units, perhaps via a directive. -- If opening a tape in read/write mode fails attempt to open - it in read-only mode, and mark the tape for read only. - Allow Simultaneous Priorities = yes => run up to Max concurrent jobs even with multiple priorities. - If you use restore replace=never, the directory attributes for @@ -470,15 +902,9 @@ Documentation to do: (any release a little bit at a time) - see lzma401.zip in others directory for new compression algorithm/library. -- Minimal autochanger handling in Bacula and in btape. -- Look into how tar does not save sockets and the possiblity of - not saving them in Bacula (Martin Simmons reported this). -- Fix restore jobs so that multiple jobs can run if they - are not using the same tape(s). - Allow the user to select JobType for manual pruning/purging. - bscan does not put first of two volumes back with all info in bscan-test. -- Implement the FreeBSD nodump flag in chflags. - Figure out how to make named console messages go only to that console and to the non-restricted console (new console class?). - Make restricted console prompt for password if *ask* is set or @@ -495,10 +921,6 @@ Documentation to do: (any release a little bit at a time) -> maybe its more easy to maintain this, if the descriptions of that commands are outsourced to a ceratin-file -- the cd-command should allow complete paths - i.e. cd /foo/bar/foo/bar - -> if a customer mails me the path to a certain file, - its faster to enter the specified directory - if the password is not configured in bconsole.conf you should be asked for it. -> sometimes you like to do restore on a customer-machine @@ -524,8 +946,6 @@ Documentation to do: (any release a little bit at a time) are not restored. See bug 213. To fix this requires creating a list of newly restored directories so that those directory permissions *can* be restored. -- Compaction of Disk space by "migrating" Volumes that have pruned - Jobs (what criteria? size, #jobs, time). - Add prune all command - Document fact that purge can destroy a part of a restore by purging one volume while others remain valid -- perhaps mark Jobs. @@ -546,9 +966,6 @@ Documentation to do: (any release a little bit at a time) - Add tree pane to left of window. - Add progress meter. - Max wait time or max run time causes seg fault -- see runtime-bug.txt -- Document writing to a CD/DVD with Bacula. -- Add a "base" package to the window installer for pthreadsVCE.dll - which is needed by all packages. - Add message to user to check for fixed block size when the forward space test fails in btape. - When unmarking a directory check if all files below are unmarked and @@ -557,14 +974,10 @@ Documentation to do: (any release a little bit at a time) - Setup lrrd graphs: (http://www.linpro.no/projects/lrrd/) Mike Acar. - Revisit the question of multiple Volumes (disk) on a single device. - Add a block copy option to bcopy. -- Investigate adding Mac Resource Forks. -- Finish work on Gnome restore GUI. - Fix "llist jobid=xx" where no fileset or client exists. - For each job type (Admin, Restore, ...) require only the really necessary fields.- Pass Director resource name as an option to the Console. - Add a "batch" mode to the Console (no unsolicited queries, ...). -- Add a .list all files in the restore tree (probably also a list all files) - Do both a long and short form. - Allow browsing the catalog to see all versions of a file (with stat data on each file). - Restore attributes of directory if replace=never set but directory @@ -586,32 +999,22 @@ Documentation to do: (any release a little bit at a time) - Check new HAVE_WIN32 open bits. - Check if the tape has moved before writing. - Handling removable disks -- see below: -- Keep track of tape use time, and report when cleaning is necessary. - Add FromClient and ToClient keywords on restore command (or BackupClient RestoreClient). - Implement a JobSet, which groups any number of jobs. If the JobSet is started, all the jobs are started together. Allow Pool, Level, and Schedule overrides. -- Enhance cancel to timeout BSOCK packets after a specific delay. -- Do scheduling by UTC using gmtime_r() in run_conf, scheduler, and - ua_status.!!! Thanks to Alan Brown for this tip. - Look at updating Volume Jobs so that Max Volume Jobs = 1 will work correctly for multiple simultaneous jobs. -- Correct code so that FileSet MD5 is calculated for < and | filename - generation. - Implement the Media record flag that indicates that the Volume does disk addressing. - Implement VolAddr, which is used when Volume is addressed like a disk, and form it from VolFile and VolBlock. -- Make multiple restore jobs for multiple media types specifying - the proper storage type. - Fix fast block rejection (stored/read_record.c:118). It passes a null pointer (rec) to try_repositioning(). -- Look at extracting Win data from BackupRead. - Implement RestoreJobRetention? Maybe better "JobRetention" in a Job, which would take precidence over the Catalog "JobRetention". - Implement Label Format in Add and Label console commands. -- Possibly up network buffers to 65K. Put on variable. - Put email tape request delays on one or more variables. User wants to cancel the job after a certain time interval. Maximum Mount Wait? - Job, Client, Device, Pool, or Volume? @@ -675,8 +1078,6 @@ Documentation to do: (any release a little bit at a time) support for Oracle database ?? === - Look at adding SQL server and Exchange support for Windows. -- Make dev->file and dev->block_num signed integers so that -1 can - be an invalid value which happens with BSR. - Create VolAddr for disk files in place of VolFile and VolBlock. This is needed to properly specify ranges. - Add progress of files/bytes to SD and FD. @@ -706,14 +1107,6 @@ Documentation to do: (any release a little bit at a time) - Implement some way for the File daemon to contact the Director to start a job or pass its DHCP obtained IP number. - Implement a query tape prompt/replace feature for a console -- Copy console @ code to gnome2-console -- Make AES the only encryption algorithm see - http://csrc.nist.gov/CryptoToolkit/aes/). It's - an officially adopted standard, has survived peer - review, and provides keys up to 256 bits. -- Take a careful look at SetACL http://setacl.sourceforge.net -- Make tree walk routines like cd, ls, ... more user friendly - by handling spaces better. - Make sure that Bacula rechecks the tape after the 20 min wait. - Set IO_NOWAIT on Bacula TCP/IP packets. - Try doing a raw partition backup and restore by mounting a @@ -728,15 +1121,11 @@ Documentation to do: (any release a little bit at a time) in the "short" pool to the "long" pool if this pool runs out of volume space? - What to do about "list files job=xxx". -- Get and test MySQL 4.0 - Look at how fuser works and /proc/PID/fd that is how Nic found the file descriptor leak in Bacula. -- Implement WrapCounters in Counters. -- Add heartbeat from FD to SD if hb interval expires. - Can we dynamically change FileSets? - If pool specified to label command and Label Format is specified, automatically generate the Volume name. -- Why can't SQL do the filename sort for restore? - Add ExhautiveRestoreSearch - Look at the possibility of loading only the necessary data into the restore tree (i.e. do it one directory at a @@ -751,16 +1140,8 @@ Documentation to do: (any release a little bit at a time) run the job but don't save the files. - Make things like list where a file is saved case independent for Windows. -- Implement migrate -- Use autochanger to handle multiple devices. -- On Windows with very long path names, it may be impossible to create - a file (and thus restore it) because the total length is too long. - We must cd into the directory then create the file without the - full path name. - Implement a Recycle command -- Test a second language e.g. french. - Start working on Base jobs. -- Implement UnsavedFiles DB record. - From Phil Stracchino: It would probably be a per-client option, and would be called something like, say, "Automatically purge obsoleted jobs". What it @@ -788,8 +1169,6 @@ Documentation to do: (any release a little bit at a time) - If SD cannot open a drive, make it periodically retry. - Add more of the config info to the tape label. -- If tape is marked read-only, then try opening it read-only rather than - failing, and remember that it cannot be written. - Refine SD waiting output: Device is being positioned > Device is being positioned for append @@ -801,7 +1180,6 @@ Documentation to do: (any release a little bit at a time) - Have SD compute MD5 or SHA1 and compare to what FD computes. - Make VolumeToCatalog calculate an MD5 or SHA1 from the actual data on the Volume and compare it. -- Implement Bacula plugins -- design API - Make bcopy read through bad tape records. - Program files (i.e. execute a program to read/write files). Pass read date of last backup, size of file last time. @@ -816,7 +1194,6 @@ Documentation to do: (any release a little bit at a time) - bscan without -v is too quiet -- perhaps show jobs. - Add code to reject whole blocks if not wanted on restore. - Check if we can increase Bacula FD priorty in Win2000 -- Make sure the MaxVolFiles is fully implemented in SD - Check if both CatalogFiles and UseCatalog are set to SD. - Possibly add email to Watchdog if drive is unmounted too long and a job is waiting on the drive. @@ -828,7 +1205,6 @@ Documentation to do: (any release a little bit at a time) - Compare tape to Client files (attributes, or attributes and data) - Make all database Ids 64 bit. - Allow console commands to detach or run in background. -- Fix status delay on storage daemon during rewind. - Add SD message variables to control operator wait time - Maximum Operator Wait - Minimum Message Interval @@ -845,8 +1221,6 @@ Documentation to do: (any release a little bit at a time) - Implement script driven addition of File daemon to config files. - Think about how to make Bacula work better with File (non-tape) archives. - Write Unix emulator for Windows. -- Put memory utilization in Status output of each daemon - if full status requested or if some level of debug on. - Make database type selectable by .conf files i.e. at runtime - Set flag for uname -a. Add to Volume label. - Restore files modified after date @@ -897,19 +1271,13 @@ Documentation to do: (any release a little bit at a time) - MaxWarnings - MaxErrors (job?) ===== -- FD sends unsaved file list to Director at end of job (see - RFC below). -- File daemon should build list of files skipped, and then - at end of save retry and report any errors. - Write a Storage daemon that uses pipes and standard Unix programs to write to the tape. See afbackup. - Need something that monitors the JCR queue and times out jobs by asking the deamons where they are. - Enhance Jmsg code to permit buffering and saving to disk. -- device driver = "xxxx" for drives. - Verify from Volume -- Ensure that /dev/null works - Need report class for messages. Perhaps report resource where report=group of messages - enhance scan_attrib and rename scan_jobtype, and @@ -1004,31 +1372,12 @@ mounting. Nobody is dying for them, but when you see what it does, you will die without it. -3. Restoring deleted files: Since I think my comments in (2) above -have low probability of implementation, I'll also suggest that you -could approach the issue of deleted files by a mechanism of having the -fd report to the dir, a list of all files on the client for every -backup job. The dir could note in the database entry for each file -the date that the file was seen. Then if a restore as of date X takes -place, only files that exist from before X until after X would be -restored. Probably the major cost here is the extra date container in -each row of the files table. - -Thanks for "listening". I hope some of this helps. If you want to -contact me, please send me an email - I read some but not all of the -mailing list traffic and might miss a reply there. - -Please accept my compliments for bacula. It is doing a great job for -me!! I sympathize with you in the need to wrestle with excelence in -execution vs. excelence in feature inclusion. - Regards, Jerry Schieffer ============================== Longer term to do: -- Design at hierarchial storage for Bacula. Migration and Clone. - Implement FSM (File System Modules). - Audit M_ error codes to ensure they are correct and consistent. - Add variable break characters to lex analyzer. @@ -1039,30 +1388,8 @@ Longer term to do: continue a save if the Director goes down (this is NOT currently the case). Must detect socket error, buffer messages for later. -- Enhance time/duration input to allow multiple qualifiers e.g. 3d2h - Add ability to backup to two Storage devices (two SD sessions) at the same time -- e.g. onsite, offsite. -- Add the ability to consolidate old backup sets (basically do a restore - to tape and appropriately update the catalog). Compress Volume sets. - Might need to spool via file is only one drive is available. -- Compress or consolidate Volumes of old possibly deleted files. Perhaps - someway to do so with every volume that has less than x% valid - files. - - -Migration: Move a backup from one Volume to another -Clone: Copy a backup -- two Volumes - -Bacula Migration is based on Jobs (apparently Networker is file by file). - -Migration triggered by: - Number of Jobs - Number of Volumes - Age of Jobs - Highwater mark (keep total size) - Lowwater mark - - ====================================================== Base Jobs design @@ -1116,127 +1443,6 @@ Need: VolSessionId and VolSessionTime. ========================================================= - -========================================================== - Unsaved File design -For each Incremental job that is run, there may be files that -were found but not saved because they were locked (this applies -only to Windows). Such a system could send back to the Director -a list of Unsaved files. -Need: -- New UnSavedFiles table that contains: - JobId - PathId - FilenameId -- Then in the next Incremental job, the list of Unsaved Files will be - feed to the FD, who will ensure that they are explicitly chosen even - if standard date/time check would not have selected them. -============================================================= - - -===== - Multiple drive autochanger data: see Alan Brown - mtx -f xxx unloadStorage Element 1 is Already Full(drive 0 was empty) - Unloading Data Transfer Element into Storage Element 1...source Element - Address 480 is Empty - - (drive 0 was empty and so was slot 1) - > mtx -f xxx load 15 0 - no response, just returns to the command prompt when complete. - > mtx -f xxx status Storage Changer /dev/changer:2 Drives, 60 Slots ( 2 Import/Export ) - Data Transfer Element 0:Full (Storage Element 15 Loaded):VolumeTag = HX001 - Data Transfer Element 1:Empty - Storage Element 1:Empty - Storage Element 2:Full :VolumeTag=HX002 - Storage Element 3:Full :VolumeTag=HX003 - Storage Element 4:Full :VolumeTag=HX004 - Storage Element 5:Full :VolumeTag=HX005 - Storage Element 6:Full :VolumeTag=HX006 - Storage Element 7:Full :VolumeTag=HX007 - Storage Element 8:Full :VolumeTag=HX008 - Storage Element 9:Full :VolumeTag=HX009 - Storage Element 10:Full :VolumeTag=HX010 - Storage Element 11:Empty - Storage Element 12:Empty - Storage Element 13:Empty - Storage Element 14:Empty - Storage Element 15:Empty - Storage Element 16:Empty.... - Storage Element 28:Empty - Storage Element 29:Full :VolumeTag=CLNU01L1 - Storage Element 30:Empty.... - Storage Element 57:Empty - Storage Element 58:Full :VolumeTag=NEX261L2 - Storage Element 59 IMPORT/EXPORT:Empty - Storage Element 60 IMPORT/EXPORT:Empty - $ mtx -f xxx unload - Unloading Data Transfer Element into Storage Element 15...done - - (just to verify it remembers where it came from, however it can be - overrriden with mtx unload {slotnumber} to go to any storage slot.) - Configuration wise: - There needs to be a table of drive # to devices somewhere - If there are - multiple changers or drives there may not be a 1:1 correspondance between - changer drive number and system device name - and depending on the way the - drives are hooked up to scsi busses, they may not be linearly numbered - from an offset point either.something like - - Autochanger drives = 2 - Autochanger drive 0 = /dev/nst1 - Autochanger drive 1 = /dev/nst2 - IMHO, it would be _safest_ to use explicit mtx unload commands at all - times, not just for multidrive changers. For a 1 drive changer, that's - just: - - mtx load xx 0 - mtx unload xx 0 - - MTX's manpage (1.2.15): - unload [] [ ] - Unloads media from drive into slot - . If is omitted, defaults to - drive 0 (as do all commands). If is - omitted, defaults to the slot that the drive was - loaded from. Note that there's currently no way - to say 'unload drive 1's media to the slot it - came from', other than to explicitly use that - slot number as the destination.AB -==== - -==== -SCSI info: -FreeBSD -undef# camcontrol devlist - at scbus0 target 2 lun 0 (pass0,sa0) - at scbus0 target 4 lun 0 (pass1,sa1) - at scbus0 target 4 lun 1 (pass2) - -tapeinfo -f /dev/sg0 with a bad tape in drive 1: -[kern@rufus mtx-1.2.17kes]$ ./tapeinfo -f /dev/sg0 -Product Type: Tape Drive -Vendor ID: 'HP ' -Product ID: 'C5713A ' -Revision: 'H107' -Attached Changer: No -TapeAlert[3]: Hard Error: Uncorrectable read/write error. -TapeAlert[20]: Clean Now: The tape drive neads cleaning NOW. -MinBlock:1 -MaxBlock:16777215 -SCSI ID: 5 -SCSI LUN: 0 -Ready: yes -BufferedMode: yes -Medium Type: Not Loaded -Density Code: 0x26 -BlockSize: 0 -DataCompEnabled: yes -DataCompCapable: yes -DataDeCompEnabled: yes -CompType: 0x20 -DeCompType: 0x0 -Block Position: 0 -===== - ==== Handling removable disks @@ -1262,22 +1468,7 @@ Block Position: 0 === Done -- Make sure that all do_prompt() calls in Dir check for - -1 (error) and -2 (cancel) returns. -- Fix foreach_jcr() to have free_jcr() inside next(). - jcr=jcr_walk_start(); - for ( ; jcr; (jcr=jcr_walk_next(jcr)) ) - ... - jcr_walk_end(jcr); -- A Volume taken from Scratch should take on the retention period - of the new pool. -- Correct doc for Maximum Changer Wait (and others) accepting only - integers. -- Implement status that shows why a job is being held in reserve, or - rather why none of the drives are suitable. -- Implement a way to disable a drive (so you can use the second - drive of an autochanger, and the first one will not be used or - even defined). -- Make sure Maximum Volumes is respected in Pools when adding - Volumes (e.g. when pulling a Scratch volume). +=== +- Fix bpipe.c so that it does not modify results pointer. + ***FIXME*** calling sequence should be changed.