X-Git-Url: https://git.sur5r.net/?a=blobdiff_plain;f=bacula%2Fkernstodo;h=a03728d99454e7ec47cfcce48ada020fdea6f1ed;hb=ae514fa2ed82ddb7329bab1a7c20e107b23cf9d2;hp=235b1017ddbd5310dc2569fcdc91312f459710e3;hpb=d0c1f5f22cbea872106e4f54ea979419ee58fc03;p=bacula%2Fbacula diff --git a/bacula/kernstodo b/bacula/kernstodo index 235b1017dd..a03728d994 100644 --- a/bacula/kernstodo +++ b/bacula/kernstodo @@ -1,11 +1,13 @@ Kern's ToDo List - 12 November 2006 + 27 April 2007 Major development: Project Developer ======= ========= Document: +- !!! Cannot restore two jobs a the same time that were + written simultaneously unless they were totally spooled. - Document cleaning up the spool files: db, pid, state, bsr, mail, conmsg, spool - Document the multiple-drive-changer.txt script. @@ -39,25 +41,112 @@ Document: - Document more precisely how to use master keys -- especially for disaster recovery. +Professional Needs: +- Migration from other vendors + - Date change + - Path change +- Filesystem types +- Backup conf/exe (all daemons) +- Backup up system state +- Detect state change of system (verify) Priority: -- Fix prog copyright (SD) all other files. -- Migration Volume span bug -- Rescue release -- Bug reports -- Test FIFO backup/restore -- make regression +- Please mount volume "xxx" on Storage device ... should also list + Pool and MediaType in case user needs to create a new volume. +- Implement wait_for_sysop() message display in wait_for_device(), which +now prints warnings too often. + +- the director seg faulted when I omitted the pool directive from a +job resource. I was experimenting and thought it redundant that I had +specified Pool, Full Backup Pool. and Differential Backup Pool. but +apparently not. This happened when I removed the pool directive and +started the director. + +- On restore add Restore Client, Original Client. +01-Apr 00:42 rufus-dir: Start Backup JobId 55, Job=kernsave.2007-04-01_00.42.48 +01-Apr 00:42 rufus-sd: Python SD JobStart: JobId=55 Client=Rufus +01-Apr 00:42 rufus-dir: Created new Volume "Full0001" in catalog. +01-Apr 00:42 rufus-dir: Using Device "File" +01-Apr 00:42 rufus-sd: kernsave.2007-04-01_00.42.48 Warning: Device "File" (/tmp) not configured to autolabel Volumes. +01-Apr 00:42 rufus-sd: kernsave.2007-04-01_00.42.48 Warning: Device "File" (/tmp) not configured to autolabel Volumes. +01-Apr 00:42 rufus-sd: Please mount Volume "Full0001" on Storage Device "File" (/tmp) for Job kernsave.2007-04-01_00.42.48 +01-Apr 00:44 rufus-sd: Wrote label to prelabeled Volume "Full0001" on device "File" (/tmp) + +- Add Where: client:/.... to restore job report. +- Ensure that each device in an Autochanger has a different + Device Index. +- Add Catalog = to Pool resource so that pools will exist + in only one catalog -- currently Pools are "global". +- Look at sg_logs -a /dev/sg0 for getting soft errors. +- btape "test" command with Offline on Unmount = yes + + This test is essential to Bacula. + + I'm going to write one record in file 0, + two records in file 1, + and three records in file 2 + + 02-Feb 11:00 btape: ABORTING due to ERROR in dev.c:715 + dev.c:714 Bad call to rewind. Device "LTO" (/dev/nst0) not open + 02-Feb 11:00 btape: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation + Kaboom! btape, btape got signal 11. Attempting traceback. + +- Ensure that moving a purged Volume in ua_purge.c to the RecyclePool + does the right thing. +- Why doesn't @"xxx abc" work in a conf file? +- Figure out some way to "automatically" backup conf changes. +- Add the OS version back to the Win32 client info. +- Restarted jobs have a NULL in the from field. +- Modify SD status command to indicate when the SD is writing + to a DVD (the device is not open -- see bug #732). +- Look at the possibility of adding "SET NAMES UTF8" for MySQL, + and possibly changing the blobs into varchar. +- Check if gnome-console works with TLS. +- Ensure that the SD re-reads the Media record if the JobFiles + does not match -- it may have been updated by another job. +- Look at moving the Storage directive from the Job to the + Pool in the default conf files. - Doc items -- Add encryption regression tests - Test Volume compatibility between machine architectures - Encryption documentation - Wrong jobbytes with query 12 (todo) - bacula-1.38.2-ssl.patch - Bare-metal recovery Windows (todo) -- Document need for UTF-8 format - - - -For 1.39: + + +Projects: +- GUI + - Admin + - Management reports + - Add doc for bweb -- especially Installation + - Look at Webmin + http://www.orangecrate.com/modules.php?name=News&file=article&sid=501 +- Performance + - FD-SD quick disconnect + - Despool attributes in separate thread + - Database speedups + - Embedded MySQL + - Check why restore repeatedly sends Rechdrs between + each data chunk -- according to James Harper 9Jan07. + - Building the in memory restore tree is slow. +- Features + - Better scheduling + - Full at least once a month, ... + - Cancel Inc if Diff/Full running + - More intelligent re-run + - New/deleted file backup + - FD plugins + - Incremental backup -- rsync, Stow + + +For next release: +- Look at mondo/mindi +- Don't restore Solaris Door files: + #define S_IFDOOR in st_mode. + see: http://docs.sun.com/app/docs/doc/816-5173/6mbb8ae23?a=view#indexterm-360 +- Make Bacula by default not backup tmpfs, procfs, sysfs, ... +- Fix hardlinked immutable files when linking a second file, the + immutable flag must be removed prior to trying to link it. - Implement Python event for backing up/restoring a file. - Change dbcheck to tell users to use native tools for fixing broken databases, and to ensure they have the proper indexes. @@ -67,13 +156,10 @@ For 1.39: - Look at why SIGPIPE during connection can cause seg fault in writing the daemon message, when Dir dropped to bacula:bacula - Look at zlib 32 => 64 problems. -- Try turning on disk seek code. - Possibly turn on St. Bernard code. - Fix bextract to restore ACLs, or better yet, use common routines. - Do we migrate appendable Volumes? - Remove queue.c code. -- Some users claim that they must do two prune commands to get a - Volume marked as purged. - Print warning message if LANG environment variable does not specify UTF-8. - New dot commands from Arno. @@ -86,8 +172,26 @@ For 1.39: .move transfer device=xxx fromslot=yyy toslot=zzz Low priority: -- Check to see if jcr->stime is lost during rescheduling of - jobs in jobq.c +- Article: http://www.heise.de/open/news/meldung/83231 +- Article: http://www.golem.de/0701/49756.html +- Article: http://lwn.net/Articles/209809/ +- Article: http://www.onlamp.com/pub/a/onlamp/2004/01/09/bacula.html +- Article: http://www.linuxdevcenter.com/pub/a/linux/2005/04/07/bacula.html +- Article: http://www.osreviews.net/reviews/admin/bacula +- Article: http://www.debianhelp.co.uk/baculaweb.htm +- Article: +- It appears to me that you have run into some sort of race + condition where two threads want to use the same Volume and they + were both given access. Normally that is no problem. However, + one thread wanted the particular Volume in drive 0, but it was + loaded into drive 1 so it decided to unload it from drive 1 and + then loaded it into drive 0, while the second thread went on + thinking that the Volume could be used in drive 1 not realizing + that in between time, it was loaded in drive 0. + I'll look at the code to see if there is some way we can avoid + this kind of problem. Probably the best solution is to make the + first thread simply start using the Volume in drive 1 rather than + transferring it to drive 0. - Fix re-read of last block to check if job has actually written a block, and check if block was written by a different job (i.e. multiple simultaneous jobs writing). @@ -117,7 +221,6 @@ Low priority: The problem is that it requires m4, which is not present on all machines at ./configure time. -- Get Perl replacement for bregex.c - Given all the problems with FIFOs, I think the solution is to do something a little different, though I will look at the code and see if there is not some simple solution (i.e. some bug that was introduced). What might be a better @@ -333,7 +436,7 @@ minutes). 3905 Device "LTO-Drive1" (/dev/nst0) open but no Bacula volume is mounted. If this is not a blank tape, try unmounting and remounting the Volume. -- Add VolumeState (enable, disable, archive) +- http://www.dwheeler.com/essays/commercial-floss.html - Add VolumeLock to prevent all but lock holder (SD) from updating the Volume data (with the exception of VolumeState). - The btape fill command does not seem to use the Autochanger @@ -355,16 +458,9 @@ minutes). - What happens when you rename a Disk Volume? - Job retention period in a Pool (and hence Volume). The job would then be migrated. -- Detect resource deadlock in Migrate when same job wants to read - and write the same device. -- Queue warning/error messages during restore so that they - are reported at the end of the report rather than being - hidden in the file listing ... - Look at -D_FORTIFY_SOURCE=2 - Add Win32 FileSet definition somewhere - Look at fixing restore status stats in SD. -- Make selection of Database used in restore correspond to - client. - Look at using ioctl(FIMAP) and FIGETBSZ for sparse files. http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/fibmap.html - Implement a mode that says when a hard read error is @@ -377,7 +473,6 @@ minutes). ("F","Full"), ("D","Diff"), ("I","Inc"); -- Add ACL to restore only to original location. - Show files/second in client status output. - Add a recursive mark command (rmark) to restore. - "Minimum Job Interval = nnn" sets minimum interval between Jobs @@ -836,8 +931,6 @@ Documentation to do: (any release a little bit at a time) block numbers in btape "test". Possibly adjust in Bacula. - Fix list volumes to output volume retention in some other units, perhaps via a directive. -- If opening a tape in read/write mode fails attempt to open - it in read-only mode, and mark the tape for read only. - Allow Simultaneous Priorities = yes => run up to Max concurrent jobs even with multiple priorities. - If you use restore replace=never, the directory attributes for @@ -845,11 +938,6 @@ Documentation to do: (any release a little bit at a time) - see lzma401.zip in others directory for new compression algorithm/library. -- Minimal autochanger handling in Bacula and in btape. -- Look into how tar does not save sockets and the possiblity of - not saving them in Bacula (Martin Simmons reported this). -- Fix restore jobs so that multiple jobs can run if they - are not using the same tape(s). - Allow the user to select JobType for manual pruning/purging. - bscan does not put first of two volumes back with all info in bscan-test. @@ -899,8 +987,6 @@ Documentation to do: (any release a little bit at a time) are not restored. See bug 213. To fix this requires creating a list of newly restored directories so that those directory permissions *can* be restored. -- Compaction of Disk space by "migrating" Volumes that have pruned - Jobs (what criteria? size, #jobs, time). - Add prune all command - Document fact that purge can destroy a part of a restore by purging one volume while others remain valid -- perhaps mark Jobs. @@ -921,9 +1007,6 @@ Documentation to do: (any release a little bit at a time) - Add tree pane to left of window. - Add progress meter. - Max wait time or max run time causes seg fault -- see runtime-bug.txt -- Document writing to a CD/DVD with Bacula. -- Add a "base" package to the window installer for pthreadsVCE.dll - which is needed by all packages. - Add message to user to check for fixed block size when the forward space test fails in btape. - When unmarking a directory check if all files below are unmarked and @@ -932,7 +1015,6 @@ Documentation to do: (any release a little bit at a time) - Setup lrrd graphs: (http://www.linpro.no/projects/lrrd/) Mike Acar. - Revisit the question of multiple Volumes (disk) on a single device. - Add a block copy option to bcopy. -- Investigate adding Mac Resource Forks. - Finish work on Gnome restore GUI. - Fix "llist jobid=xx" where no fileset or client exists. - For each job type (Admin, Restore, ...) require only the really necessary @@ -1082,11 +1164,6 @@ Documentation to do: (any release a little bit at a time) to start a job or pass its DHCP obtained IP number. - Implement a query tape prompt/replace feature for a console - Copy console @ code to gnome2-console -- Make AES the only encryption algorithm see - http://csrc.nist.gov/CryptoToolkit/aes/). It's - an officially adopted standard, has survived peer - review, and provides keys up to 256 bits. -- Take a careful look at SetACL http://setacl.sourceforge.net - Make tree walk routines like cd, ls, ... more user friendly by handling spaces better. - Make sure that Bacula rechecks the tape after the 20 min wait. @@ -1103,7 +1180,6 @@ Documentation to do: (any release a little bit at a time) in the "short" pool to the "long" pool if this pool runs out of volume space? - What to do about "list files job=xxx". -- Get and test MySQL 4.0 - Look at how fuser works and /proc/PID/fd that is how Nic found the file descriptor leak in Bacula. - Implement WrapCounters in Counters. @@ -1126,14 +1202,8 @@ Documentation to do: (any release a little bit at a time) run the job but don't save the files. - Make things like list where a file is saved case independent for Windows. -- Implement migrate - Use autochanger to handle multiple devices. -- On Windows with very long path names, it may be impossible to create - a file (and thus restore it) because the total length is too long. - We must cd into the directory then create the file without the - full path name. - Implement a Recycle command -- Test a second language e.g. french. - Start working on Base jobs. - Implement UnsavedFiles DB record. - From Phil Stracchino: @@ -1163,8 +1233,6 @@ Documentation to do: (any release a little bit at a time) - If SD cannot open a drive, make it periodically retry. - Add more of the config info to the tape label. -- If tape is marked read-only, then try opening it read-only rather than - failing, and remember that it cannot be written. - Refine SD waiting output: Device is being positioned > Device is being positioned for append @@ -1203,7 +1271,6 @@ Documentation to do: (any release a little bit at a time) - Compare tape to Client files (attributes, or attributes and data) - Make all database Ids 64 bit. - Allow console commands to detach or run in background. -- Fix status delay on storage daemon during rewind. - Add SD message variables to control operator wait time - Maximum Operator Wait - Minimum Message Interval @@ -1428,16 +1495,6 @@ Longer term to do: Migration: Move a backup from one Volume to another Clone: Copy a backup -- two Volumes -Bacula Migration is based on Jobs (apparently Networker is file by file). - -Migration triggered by: - Number of Jobs - Number of Volumes - Age of Jobs - Highwater mark (keep total size) - Lowwater mark - - ====================================================== Base Jobs design @@ -1637,95 +1694,25 @@ Block Position: 0 === Done -- Make sure that all do_prompt() calls in Dir check for - -1 (error) and -2 (cancel) returns. -- Fix foreach_jcr() to have free_jcr() inside next(). - jcr=jcr_walk_start(); - for ( ; jcr; (jcr=jcr_walk_next(jcr)) ) - ... - jcr_walk_end(jcr); -- A Volume taken from Scratch should take on the retention period - of the new pool. -- Correct doc for Maximum Changer Wait (and others) accepting only - integers. -- Implement status that shows why a job is being held in reserve, or - rather why none of the drives are suitable. -- Implement a way to disable a drive (so you can use the second - drive of an autochanger, and the first one will not be used or - even defined). -- Make sure Maximum Volumes is respected in Pools when adding - Volumes (e.g. when pulling a Scratch volume). -- Keep same dcr when switching device ... -- Implement code that makes the Dir aware that a drive is an - autochanger (so the user doesn't need to use the Autochanger = yes - directive). -- Make catalog respect ACL. -- Add recycle count to Media record. -- Add initial write date to Media record. -- Fix store_yesno to be store_bitmask. ---- create_file.c.orig Fri Jul 8 12:13:05 2005 -+++ create_file.c Fri Jul 8 12:13:07 2005 -@@ -195,6 +195,8 @@ - attr->ofname, be.strerror()); - return CF_ERROR; - } -+ } else if(S_ISSOCK(attr->statp.st_mode)) { -+ Dmsg1(200, "Skipping socket: %s\n", attr->ofname); - } else { - Dmsg1(200, "Restore node: %s\n", attr->ofname); - if (mknod(attr->ofname, attr->statp.st_mode, attr->statp.st_rdev) != 0 && errno != EEXIST) { -- Add true/false to conf same as yes/no -- Reserve blocks other restore jobs when first cannot connect to SD. -- Fix Maximum Changer Wait, Maximum Open Wait, Maximum Rewind Wait to - accept time qualifiers. -- Does ClientRunAfterJob fail the job on a bad return code? -- Make hardlink code at line 240 of find_one.c use binary search. -- Add ACL error messages in src/filed/acl.c. -- Make authentication failures single threaded. -- Make Dir and SD authentication errors single threaded. -- Fix catreq.c digestbuf at line 411 in src/dird/catreq.c -- Make base64.c (bin_to_base64) take a buffer length - argument to avoid overruns. - and verify that other buffers cannot overrun. -- Implement VolumeState as discussed with Arno. -- Add LocationId to update volume -- Add LocationLog - LogId - Date - User text - MediaId - LocationId - NewState??? -- Add Comment to Media record -- Fix auth compatibility with 1.38 -- Update dbcheck to include Log table -- Update llist to include new fields. -- Make unmount unload autochanger. Make mount load slot. -- Fix bscan to report the JobType when restoring a job. -- Fix wx-console scanning problem with commas in names. -- Add manpages to the list of directories for make install. Notify - Scott -- Add bconsole option to use stdin/out instead of conio. -- Fix ClientRunBefore/AfterJob compatibility. -- Ensure that connection to daemon failure always indicates what - daemon it was trying to connect to. -- Freespace on DVD requested over and over even with no intervening - writes. -- .update volume [enabled|disabled|*see below] - > However, I could easily imagine an option to "update slots" that says - > "enable=yes|no" that would automatically enable or disable all the Volumes - > found in the autochanger. This will permit the user to optionally mark all - > the Volumes in the magazine disabled prior to taking them offsite, and mark - > them all enabled when bringing them back on site. Coupled with the options - > to the slots keyword, you can apply the enable/disable to any or all volumes. -- Restricted consoles start in the Default catalog even if it - is not permitted. -- When reading through parts on the DVD, the DVD is mounted and - unmounted for each part. -- Make sure that the restore options don't permit "seeing" other - Client's job data. -- Restore of a raw drive should not try to check the volume size. -- Lock tape drive door when open() -- Make release unload any autochanger. -- Arno's reservation deadlock. -- Eric's SD patch +- Why the heck doesn't bacula drop root priviledges before connecting to + the DB? +- Look at using posix_fadvise(2) for backups -- see bug #751. + Possibly add the code at findlib/bfile.c:795 +/* TCP socket options */ +#define TCP_NODELAY 1 /* Turn off Nagle's algorithm. */ +#define TCP_MAXSEG 2 /* Limit MSS */ +#define TCP_CORK 3 /* Never send partially complete segments */ +#define TCP_KEEPIDLE 4 /* Start keeplives after this period */ +#define TCP_KEEPINTVL 5 /* Interval between keepalives */ +#define TCP_KEEPCNT 6 /* Number of keepalives before death */ +#define TCP_SYNCNT 7 /* Number of SYN retransmits */ +#define TCP_LINGER2 8 /* Life time of orphaned FIN-WAIT-2 state */ +#define TCP_DEFER_ACCEPT 9 /* Wake up listener only when data arrive */ +#define TCP_WINDOW_CLAMP 10 /* Bound advertised window */ +#define TCP_INFO 11 /* Information about this connection. */ +#define TCP_QUICKACK 12 /* Block/reenable quick acks */ +#define TCP_CONGESTION 13 /* Congestion control algorithm */ +- Fix bnet_connect() code to set a timer and to use time to + measure the time. +- Implement 4th argument to make_catalog_backup that passes hostname. +- Test FIFO backup/restore -- make regression