Kern's ToDo List
- 26 August 2002
-
-Irix conversion notes:
-- no uuencode
-- no hostname
-To do:
-- Document passwords.
-- Document running multiple Jobs
-- Document that two Verifys at same time on same client do not work.
-- Document how to recycle a tape in 7 days even if the backup takes a long time.
-- Document default config file locations.
-- Document better includes (does it cross file systems ?).
-- Document specifically how to add new File daemon to config files.
-- Document forcing a new tape to be used.
-- Document "Error in message.c:500 Mail program terminated in error.
-
-From Chuck:
---bindir is wrong and does not reflect prefix= in the *_sqlite_* scripts
- (src/cats)
---top level configure options are not passed to the depkgs, particularly
- prefix=
---Also, it might be better to split the depkgs location from the --with-sqlite
- location.
---should be able to specify e.g. --with-sqlite=/opt/local and have it find
- lib, bin, sbin for itself
- I tried this and it didn't find sqlite.h
---sd.conf password does not match dir.conf storage password
-=======
-
-- Create all pools when Director starts
-- Implement autochanger for restore. ARRRGGG! I forgot!
-- Make BSR accept count (total files to be restored).
-- Make BSR return next_block when it knows record is not
- in block, done when count is reached, and possibly other
- optimizations. I.e. add a state word.
+ 25 January 2003
+
+Documentation to do: (a little bit at a time)
+- Document running a test version.
+- Document query file format.
+- Document static linking
+- Document how to automatically backup all local partitions
+- Document problems with Verify and pruning.
+- Document how to use multiple databases.
+
+
+Testing to do: (painful)
+- that console command line options work
+- blocksize recognition code.
+
+For 1.30 release:
+- Add Signature type to File DB record.
+- CD into subdirectory when open()ing files for backup to
+ speed up things. Test with testfind().
+- Add prefixlinks to where or not where absolute links to FD.
+- Look at handling <> in smtp doesn't work with exim.
+- Priority job to go to top of list.
+- Implement Bar code handling
+- Why is catreq.c:111 Find vol called twice for a job?
+- Find out why Full saves run slower and slower (hashing?)
+- Figure out how to allow multiple simultaneous file Volumes on a single device.
+- Why are save/restore of device different sizes (sparse?) Yup! Fix it.
+- Implement some way for the Console to dynamically create a job.
+- Restore to a particular time -- e.g. before date, after date.
+- Implement disk spooling
+- Implement finer multiprocessing options.
+- Solaris -I on tar for include list
+- Enable avoid backing up archive device (findlib/find_one.c:128)
+- Implement FileOptions (see end of this document)
+- Implement Bacula plugins -- design API
+- Make bcopy read through bad tape records.
+- Need a verbose mode in restore, perhaps to bsr.
+- bscan without -v is too quiet -- perhaps show jobs.
+- Add code to reject whole blocks if not wanted on restore.
+- Implement multiple simultaneous file Volumes on a single device.
+- Start working on Base jobs.
+- Make sure the MaxVolFiles is fully implemented in SD
+- Flush all the daemon messages at the end of every job.
+- Check if both CatalogFiles and UseCatalog are set to SD.
+- Check if we can increase Bacula FD priorty in Win2000
+- Need return status on read_cb() from read_records(). Need multiple
+ records -- one per Job, maybe a JCR or some other structure with
+ a block and a record.
+- Figure out how to do a bare metal Windows restore
+- Fix read_record to handle multiple sessions.
+- Program files (i.e. execute a program to read/write files).
+ Pass read date of last backup, size of file last time.
+- Put system type returned by FD into catalog.
+- Possibly add email to Watchdog if drive is unmounted too
+ long and a job is waiting on the drive.
+- Strip trailing slashes from Include directory names in the FD.
+- Use read_record.c in SD code.
+- Why don't we get an error message from Win32 FD when bootstrap
+ file cannot be created for restore command?
+- Need to specify MaximumConcurrentJobs in the Job resource.
+- When Marking a file in Restore that is a hard link, also
+ mark the link so that the data will be reloaded.
+- Restore program that errors in SD due to no tape reports
+ OK incorrectly in output.
- After unmount, if restore job started, ask to mount.
-- Fix db_get_fileset in cats/sql_get.c for multiple records.
-- Fix start/end blocks for File
-- Add new code to scheduler.c and run_conf.c
-- Volume Bytes shows bytes on last volume written in Job summary.
-- Fix catalog filename truncation in sql_get and sql_create. Use
- only a single filename split routine.
-- Add command to reset VolFiles to a larger value (don't allow
- a smaller number or print big warning).
-- Make SD disallow writing on Volume with fewer files than in
- the catalog.
- Make Restore report an error if FD or SD term codes are not OK.
- Convert all %x substitution variables, which are hard to remember
and read to %(variable-name). Idea from TMDA.
-- Report compression % and other compression statistics if turned on.
- Add JobLevel in FD status (but make sure it is defined).
- Make Pool resource handle Counter resources.
- Remove NextId for SQLite. Optimize.
-- Fix gethostbyname() to use gethostbyname_r()
-- Implement ./configure --with-client-only
- Strip trailing / from Include
- Move all SQL statements into a single location.
- Cleanup db_update_media and db_update_pool
- Add UA rc and history files.
- put termcap (used by console) in ./configure and
allow -with-termcap-dir.
-- Remove JobMediaId it is not used.
- Enhance time and size scanning routines.
- Fix Autoprune for Volumes to respect need for full save.
-- DateWritten field on tape may be wrong.
- Fix Win32 config file definition name on /install
- No READLINE_SRC if found in alternate directory.
- Add Client FS/OS id (Linux, Win95/98, ...).
-- Put Windows files in Windows stream?
-- Ensure that everyone uses btime routines.
+- Test a second language e.g. french.
+- Compare tape to Client files (attributes, or attributes and data)
+- Make all database Ids 64 bit.
+- Write an applet for Linux.
+- Add estimate to Console commands
+- Find solution to blank filename (i.e. path only) problem.
+- Implement new daemon communications protocol.
+- Remove PoolId from Job table, it exists in Media.
+- Allow console commands to detach or run in background.
+- Fix status delay on storage daemon during rewind.
+- Add SD message variables to control operator wait time
+ - Maximum Operator Wait
+ - Minimum Message Interval
+ - Maximum Message Interval
+- Send Operator message when cannot read tape label.
+- Verify level=Volume (scan only), level=Data (compare of data to file).
+ Verify level=Catalog, level=InitCatalog
+- Events file
+- Add keyword search to show command in Console.
+- Fix Win2000 error with no messages during startup.
+- Events : tape has more than xxx bytes.
+- Restrict characters permitted in a Resource name.
+- Complete code in Bacula Resources -- this will permit
+ reading a new config file at any time.
+- Handle ctl-c in Console
+- Implement LabelTemplate (at least first cut).
+- Implement script driven addition of File daemon to config files.
+- Think about how to make Bacula work better with File (non-tape) archives.
+- Write Unix emulator for Windows.
+
+- Implement new serialize subroutines
+ send(socket, "string", &Vol, "uint32", &i, NULL)
+- Audit all UA commands to ensure that we always prompt where possible.
+- If ./btape is called without /dev, assume argument is a Storage resource name.
+- Put memory utilization in Status output of each daemon
+ if full status requested or if some level of debug on.
+- Make database type selectable by .conf files i.e. at runtime
+- gethostbyname failure in bnet_connect() continues
+ generating errors -- should stop.
+- Set flag for uname -a. Add to Volume label.
+- Implement throttled work queue.
+- Check for EOT at ENOSPC or EIO or ENXIO (unix Pc)
+- Allow multiple Storage specifications (or multiple names on
+ a single Storage specification) in the Job record. Thus a job
+ can be backed up to a number of storage devices.
+- Implement dump label to UA
+- Concept of VolumeSet during restore which is a list
+ of Volume names needed.
+- Restore files modified after date
+- Restore file modified before date
+- Emergency restore info:
+ - Backup Bacula
+ - Backup working directory
+ - Backup Catalog
+- Restore -- do nothing but show what would happen
+- SET LD_RUN_PATH=$HOME/mysql/lib/mysql
+- Implement Restore FileSet=
+- Create a protocol.h and protocol.c where all protocol messages
+ are concentrated.
+- If SD cannot open a drive, make it periodically retry.
+- Remove duplicate fields from jcr (e.g. jcr.level and jcr.jr.Level, ...).
+- Timout a job or terminate if link goes down, or reopen link and query.
+- Find general solution for sscanf size problems (as well
+ as sprintf. Do at run time?
+- Concept of precious tapes (cannot be reused).
+- Make bcopy copy with a single tape drive.
+- Permit changing ownership during restore.
+
+- Autolabel should be specified by DIR instead of SD.
+- Find out how to get the system tape block limits, e.g.:
+ Apr 22 21:22:10 polymatou kernel: st1: Block limits 1 - 245760 bytes.
+ Apr 22 21:22:10 polymatou kernel: st0: Block limits 2 - 16777214 bytes.
+- Storage daemon
+ - Add media capacity
+ - AutoScan (check checksum of tape)
+ - Format command = "format /dev/nst0"
+ - MaxRewindTime
+ - MinRewindTime
+ - MaxBufferSize
+ - Seek resolution (usually corresponds to buffer size)
+ - EODErrorCode=ENOSPC or code
+ - Partial Read error code
+ - Partial write error code
+ - Nonformatted read error
+ - Nonformatted write error
+ - WriteProtected error
+ - IOTimeout
+ - OpenRetries
+ - OpenTimeout
+ - IgnoreCloseErrors=yes
+ - Tape=yes
+ - NoRewind=yes
+- Pool
+ - Maxwrites
+ - Recycle period
+- Job
+ - MaxWarnings
+ - MaxErrors (job?)
+=====
+- FD sends unsaved file list to Director at end of job (see
+ RFC below).
+- File daemon should build list of files skipped, and then
+ at end of save retry and report any errors.
+- Write a Storage daemon that uses pipes and
+ standard Unix programs to write to the tape.
+ See afbackup.
+- Need something that monitors the JCR queue and
+ times out jobs by asking the deamons where they are.
+- Enhance Jmsg code to permit buffering and saving to disk.
+- device driver = "xxxx" for drives.
+- restart: paranoid: read label fsf to
+ eom read append block, and go
+ super-paranoid: read label, read all files
+ in between, read append block, and go
+ verify: backspace, read append block, and go
+ permissive: same as above but frees drive
+ if tape is not valid.
+- Verify from Volume
+- Ensure that /dev/null works
+- Need report class for messages. Perhaps
+ report resource where report=group of messages
+- enhance scan_attrib and rename scan_jobtype, and
+ fill in code for "since" option
+- Director needs a time after which the report status is sent
+ anyway -- or better yet, a retry time for the job.
+ Don't reschedule a job if previous incarnation is still running.
+- Some way to automatically backup everything is needed????
+- Need a structure for pending actions:
+ - buffered messages
+ - termination status (part of buffered msgs?)
+- Concept of grouping Storage devices and job can use
+ any of a number of devices
+- Drive management
+ Read, Write, Clean, Delete
+- Login to Bacula; Bacula users with different permissions:
+ owner, group, user, quotas
+- Store info on each file system type (probably in the job header on tape.
+ This could be the output of df; or perhaps some sort of /etc/mtab record.
+
+Longer term to do:
+- Design at hierarchial storage for Bacula.
+- Implement FSM (File System Modules).
+- Identify unchanged or "system" files and save them to a
+ special tape thus removing them from the standard
+ backup FileSet -- BASE backup.
+- Turn virutally all sprintfs into snprintfs.
+- Heartbeat between daemons.
+- Audit M_ error codes to ensure they are correct and consistent.
+- Add variable break characters to lex analyzer.
+ Either a bit mask or a string of chars so that
+ the caller can change the break characters.
+- Make a single T_BREAK to replace T_COMMA, etc.
+- Ensure that File daemon and Storage daemon can
+ continue a save if the Director goes down (this
+ is NOT currently the case). Must detect socket error,
+ buffer messages for later.
+- Enhance time/duration input to allow multiple qualifiers e.g. 3d2h
+- Add ability to backup to two Storage devices (two SD sessions) at
+ the same time -- e.g. onsite, offsite.
+- Add the ability to consolidate old backup sets (basically do a restore
+ to tape and appropriately update the catalog). Compress Volume sets.
+ Might need to spool via file is only one drive is available.
+- Compress or consolidate Volumes of old possibly deleted files. Perhaps
+ someway to do so with every volume that has less than x% valid
+ files.
Projects:
- Bacula Projects Roadmap
+ Bacula Projects Roadmap
17 August 2002
+ last update 5 January 2003
Item 1: Multiple simultaneous Jobs. (done)
+Done -- Restore part needs better implementation to work correctly
What: Permit multiple simultaneous jobs in Bacula.
Item 2: Make the Storage daemon use intermediate file storage to buffer data.
- (not necessary)
+Deferred -- not necessary yet.
What: If data is coming into the SD too fast, buffer it to
disk if the user has configured this option.
testing after item 1 is implemented.
-Item 3: Write the bscan program.
+Item 3: Write the bscan program -- also write a bcopy program.
+Done
What: Write a program that reads a Bacula tape and puts all the
appropriate data into the catalog. This allows recovery
Item 6: Write a regression script.
+Started
What: This is an automatic script that runs and tests as many features
of Bacula as possible. The output is compared to previous
Item 10: Define definitive tape format.
+Done (version 1.27)
What: Define that definitive tape format that will not change
for the next millennium.
-I haven't put these in any particular order.
-
-Small projects:
-- Rework Storage daemon with new rwl_lock routines.
-- Compare tape to Client files (attributes, or attributes and data)
-- Restore options (overwrite, overwrite if older,
- overwrite if newer, never overwrite, ...)
-- Restore to a particular time -- e.g. before date, after date.
-- On command write out a bootstrap file (at end of job).
-- Make all database Ids 64 bit.
-- Pass JCR to database routines permitting better error printing.
-- Make bls accept bootstrap record.
-- Write an applet for Linux.
-- Make SD reject writing on tape where Catalog and tape # files
- don't agree (possibly OK if tape > catalog).
-- Implement new daemon communications protocol.
-- Add DIR config directive to spool attributes.
-- Pass DIR config variable to SD for no attributes.
-- Create JobMedia record for all running Jobs when Media changes.
-- Send Volumes needed during restore to Console (just after
- create_volume_list) -- also in restore command?
-- Add estimate to Console commands
-- Find solution to blank filename (i.e. path only) problem.
-Dump:
- mysqldump -f --opt bacula >bacula
+======================================================
+ Base Jobs design
+It is somewhat like a Full save becomes an incremental since
+the Base job (or jobs) plus other non-base files.
+Need:
+- New BaseFile table that contains:
+ JobId, BaseJobId, FileId (from Base).
+ i.e. for each base file that exists but is not saved because
+ it has not changed, the File daemon sends the JobId, BaseId,
+ and FileId back to the Director who creates the DB entry.
+- To initiate a Base save, the Director sends the FD
+ the FileId, and full filename for each file in the Base.
+- When the FD finds a Base file, he requests the Director to
+ send him the full File entry (stat packet plus MD5), or
+ conversely, the FD sends it to the Director and the Director
+ says yes or no. This can be quite rapid if the FileId is kept
+ by the FD for each Base Filename.
+- It is probably better to have the comparison done by the FD
+ despite the fact that the File entry must be sent across the
+ network.
+- An alternative would be to send the FD the whole File entry
+ from the start. The disadvantage is that it requires a lot of
+ space. The advantage is that it requires less communications
+ during the save.
+- The Job record must be updated to indicate that one or more
+ Bases were used.
+- At end of Job, FD returns:
+ 1. Count of base files/bytes not written to tape (i.e. matches)
+ 2. Count of base file that were saved i.e. had changed.
+- No tape record would be written for a Base file that matches, in the
+ same way that no tape record is written for Incremental jobs where
+ the file is not saved because it is unchanged.
+- On a restore, all the Base file records must explicitly be
+ found from the BaseFile tape. I.e. for each Full save that is marked
+ to have one or more Base Jobs, search the BaseFile for all occurrences
+ of JobId.
+- An optimization might be to make the BaseFile have:
+ JobId
+ BaseId
+ FileId
+ plus
+ FileIndex
+ This would avoid the need to explicitly fetch each File record for
+ the Base job. The Base Job record will be fetched to get the
+ VolSessionId and VolSessionTime.
+=========================================================
+
+==========================================================
+ Unsaved File design
+For each Incremental job that is run, there may be files that
+were found but not saved because they were locked (this applies
+only to Windows). Such a system could send back to the Director
+a list of Unsaved files.
+Need:
+- New UnSavedFiles table that contains:
+ JobId
+ PathId
+ FilenameId
+- Then in the next Incremental job, the list of Unsaved Files will be
+ feed to the FD, who will ensure that they are explicitly chosen even
+ if standard date/time check would not have selected them.
+=============================================================
+
-To be done:
-- Remove PoolId from Job table, it exists in Media.
-- Allow console commands to detach or run in background.
-- Fix status delay on storage daemon during rewind.
-- Add VerNo to each Session label record.
-- Add Job to Session records.
-- Add VOLUME_CAT_INFO to the EOS tape record (as
- well as to the EOD record).
-- Add SD message variables to control operator wait time
- - Maximum Operator Wait
- - Minimum Message Interval
- - Maximum Message Interval
-- Add EOM handling variables
- - Write EOD records
- - Require EOD records
-- Send Operator message when cannot read tape label.
-- Think about how to handle I/O error on MTEOM.
-- If Storage daemon aborts a job, ensure that this
- is printed in the error message.
-- Verify level=Volume (scan only), level=Data (compare of data to file).
- Verify level=Catalog, level=InitCatalog
-- Dump of Catalog
-- Cold start full restore (restore catalog then
- user selects what to restore). Write summary file containing only
- Job, Media, and Catalog information. Store on another machine.
-- Dump/Restore database
-- File system type
-- Events file
-- Add keyword search to show command in Console.
-- Fix Win2000 error with no messages during startup.
-- Events : tape has more than xxx bytes.
-- In Storage daemon, status should include job cancelled.
-- Write general list maintenance subroutines.
-- Implement immortal format with EDOs.
-- Restrict characters permitted in a Resource name.
-- Provide definitive identification of type in backup.
-- Complete code in Bacula Resources -- this will permit
- reading a new config file at any time.
-- Document new Console
-- Handle ctl-c in Console
-- Test restore of Windows backup
-- Implement LabelTemplate (at least first cut).
-- Implement script driven addition of File daemon to
- config files.
+=============================================================
+
+ Request For Comments For File Backup Options
+ 10 November 2002
+
+Subject: File Backup Options
+
+Problem:
+ A few days ago, a Bacula user who is backing up to file volumes and
+ using compression asked if it was possible to suppress compressing
+ all .gz files since it was a waste of CPU time. Although Bacula
+ currently permits using different options (compression, ...) on
+ a directory by directory basis, it cannot do it on a file by
+ file basis, which is clearly what was desired.
+
+Proposed Implementation:
+ To solve this problem, I propose the following:
+
+ - Add a new Director resource type called FileOptions.
+
+ - The FileOptions resource will have records for all
+ options that can currently be specified on the Include record
+ (in a FileSet). Examples below.
+
+ - The FileOptions resource will permit an exclude option as well
+ as a number of additional options.
+
+ - The heart of the FileOptions resource is the ability to
+ supply any number of ApplyTo records which specify POSIX
+ regular expressions. These ApplyTo regular expressions are
+ applied to the fully qualified filename (path and all). If
+ one matches, then the FileOptions will be used.
+
+ - When an ApplyTo specification matches an included file, the
+ options specified in the FileOptions resource will override
+ the default options specified on the Include record.
+
+ - Include records will be modified to permit referencing one or
+ more FileOptions resources. The FileOptions will be used
+ in the order listed on the Include record and the first
+ one that matches will be applied.
+
+ - Options (or specifications) currently supplied on the Include
+ record will be deprecated (i.e. removed in a later version a
+ year or so from now).
+
+ - The Exclude record will be deprecated as the same functionality
+ can be obtained by using an Exclude = yes in the FileOptions.
+
+FileOptions records:
+ The following records can appear in the FileOptions resource. An
+ asterisk preceding the name indicates a feature not currently
+ implemented.
+
+ For Backup Jobs:
+ - Compression= (GZIP, ...)
+ - Signature= (MD5, SHA1, ...)
+ - *Encryption=
+ - OneFs= (yes/no) - remain on one filesystem
+ - Recurse= (yes/no) - recurse into subdirectories
+ - Sparse= (yes/no) - do sparse file backup
+ - *Exclude= (yes/no) - exclude file from being saved
+ - *Reader= (filename) - external read (backup) program
+ - *Plugin= (filename) - read/write plugin module
+
+ For Verify Jobs:
+ - verify= (ipnougsamc5) - verify options
+
+ For Restore Jobs:
+ - replace= (always/ifnewer/ifolder/never) - replace options currently
+ implemented in 1.27
+ - *Writer= (filename) - external write (restore) program
+
+
+Implementation:
+ Currently options specifying compression, MD5 signatures, recursion,
+ ... of a FileSet are supplied on the Include record. These will now
+ all be collected into a FileOptions resource, which will be
+ specified on the Include in place of the options. Multiple FileOptions
+ may be specified. Since the FileOptions contain regular expressions
+ that are applied to the full filename, this will give the ability
+ to specify backup options on a file by file basis to whatever level
+ of detail you wish.
+
+Example:
+
+ Today:
+
+ FileSet {
+ Name = "FullSet"
+ Include = compression=GZIP signature=MD5 {
+ /
+ }
+ }
+
+ Proposal:
+
+ FileSet {
+ Name = "FullSet"
+ Include = FileOptions=Opts {
+ /
+ }
+ }
+ FileOptions {
+ Name = Opts
+ Compression = GZIP
+ Signature = MD5
+ ApplyTo = /*.?*/
+ }
+
+ That's a lot more to do the same thing, but it gives the ability to
+ apply options on a file by file basis. For example, suppose you
+ want to compress all files but not any file with extensions .gz or .Z.
+ You could do so as follows:
+
+ FileSet {
+ Name = "FullSet"
+ Include = FileOptions=NoCompress FileOptions=Opts {
+ /
+ }
+ }
+ FileOptions {
+ Name = Opts
+ Compression = GZIP
+ Signature = MD5
+ ApplyTo = /*.?*/ # matches all files
+ }
+ FileOptions {
+ Name = NoCompress
+ Signature = MD5
+ # Note multiple ApplyTos are ORed
+ ApplyTo = /*.gz/ # matches .gz files */
+ ApplyTo = /*.Z/ # matches .Z files */
+ }
+
+ Now, since the NoCompress FileOptions is specified first on the
+ Include line, any *.gz or *.Z file will have an MD5 signature computed,
+ but will not be compressed. For all other files, the NoCompress will not
+ match, so the Opts options will be used which will include GZIP
+ compression.
+
+Questions:
+ - Is it necessary to provide some means of ANDing regular expressions
+ and negation? (not currently planned)
+
+ e.g. ApplyTo = /*.gz/ && !/big.gz/
+
+ - I see that Networker has a "null" module which, if specified, does not
+ backup the file, but does make an record of the file in the catalog
+ so that the catalog will reflect an exact picture of the filesystem.
+ The result is that the file can be "seen" when "browsing" the save
+ sets, but it cannot be restored.
+
+ Is this really useful? Should it be implemented in Bacula?
+
+Results:
+ After implementing the above, the user will be able to specify
+ on a file by file basis (using regular expressions) what options are
+ applied for the backup.
+====================================
+
+=========================================
+Proposal by Bill Sellers
+
+Return-Path: <w.a.sellers@larc.nasa.gov>
+Received: from post.larc.nasa.gov (post.larc.nasa.gov [128.155.4.45]) by matou.sibbald.com (8.11.6/8.11.6) with ESMTP id h0ELUIm07622 for <kern@sibbald.com>; Tue, 14 Jan 2003 22:30:18 +0100
+Received: from Baron.larc.nasa.gov (baron.larc.nasa.gov [128.155.40.132]) by post.larc.nasa.gov (pohub4.6) with ESMTP id QAA09768 for <kern@sibbald.com>; Tue, 14 Jan 2003 16:30:14 -0500 (EST)
+Message-Id: <5.1.0.14.2.20030114153452.028dbae8@pop.larc.nasa.gov>
+X-Sender: w.a.sellers@pop.larc.nasa.gov
+X-Mailer: QUALCOMM Windows Eudora Version 5.1
+Date: Tue, 14 Jan 2003 16:30:18 -0500
+To: Kern Sibbald <kern@sibbald.com>
+From: Bill Sellers <w.a.sellers@larc.nasa.gov>
+Subject: Re: [Bacula-users] Bacula remote storage?
+In-Reply-To: <1042565382.1845.177.camel@rufus>
+References: <5.1.0.14.2.20030114113004.0293a210@pop.larc.nasa.gov> <5.1.0.14.2.20030113170650.028dad88@pop.larc.nasa.gov> <5.1.0.14.2.20030113170650.028dad88@pop.larc.nasa.gov> <5.1.0.14.2.20030114113004.0293a210@pop.larc.nasa.gov>
+Mime-Version: 1.0
+Content-Type: text/plain; charset="us-ascii"; format=flowed
+X-Annoyance-Filter-Junk-Probability: 0
+X-Annoyance-Filter-Classification: Mail
+At 06:29 PM 1/14/2003 +0100, you wrote:
+>Hello Bill,
+>
+>Well, if you cannot put a Bacula client on the machine,
+>then it is a big problem. If you know of some software
+>that can do what you want, let me know, because I
+>really just don't know how to do it -- at least not
+>directly.
+
+
+Hi Kern,
+
+We have been able to get Amanda to use the HSM as a storage
+device. Someone here wrote a driver for Amanda. BUT, Amanda doesn't
+handle Windows systems very well (or at all without Samba). So I am
+looking for a backup system that has a Windows client. I really like the
+Windows integration of Bacula.
+
+From the command line, its rather trivial to move the data around. We use
+something like-
+
+tar cf - ./files | gzip -c | rsh hsm dd of=path/file.tgz
+
+or if you use GNU tar:
+
+tar czf hsm:path/file.tgz ./files
+
+One idea for you to consider; Sendmail offers pipes in the aliases file;
+(mailpipe: "|/usr/bin/vacation root") and Perl supports pipes in the
+"open" statement (open FILE, "|/bin/nroff -man";) Could you could make a
+pipe available, as a storage device? Then we could use any command that
+handles stdin as a storage destination.
+
+Something like-
+
+Storage {
+ Name = HSM-RSH
+ Address = hsm
+ #Password is not used in rsh, but might be used in ftp.
+ Device = "| gzip -c | rsh hsm dd of=path/file.tgz"
+ MediaType = Pipe
+}
+
+Storage {
+ Name = HSM-FTP
+ Address = hsm
+ Password = "foobar&-"
+ Device = "| ncftpput -c hsm /path/file.bacula"
+ MediaType = Pipe
+}
+
+>If you have some local storage available, you could
+>use Bacula to backup to disk volumes, then use some
+>other software (ftp, scp) to move them to the HSM
+>machine. However, this is a bit kludgy.
+
+
+It is, but maybe worth a try. Is there some function in Bacula to put
+variables in filenames? i.e. backup.2003-01-15.root
+
+Thanks!
+Bill
+
+---
+Bill Sellers
+w.a.sellers@larc.nasa.gov
+
+==============================================
+ The Project for the above
+
+I finally realized that this is not at all
+the same as reader/writer programs or plugins,
+which are alternate ways of accessing the
+files to be backed up. Rather, it is an alternate
+form of storage device, and I have always planned
+that Bacula should be able to handle all sorts
+of storage devices.
+
+So, I propose the following phases:
+
+1. OK from you to invest some time in testing
+ this as I implement it (requires that you
+ know how to download from the SourceForge
+ cvs -- which I imagine is a piece of cake
+ for you).
+
+2. Dumb implementation by allowing a device to
+ be a fifo for write only.
+ Reason: easy to implement, proof of concept.
+
+3. Try reading from fifo but with fixed block
+ sizes.
+ Reason: proof of concept, easy to implement.
+
+4. Extend reading from fifo (restores) to handle
+ variable blocks.
+ Reason: requires some delicate low level coding
+ which could destabilize all of Bacula.
+
+5. Implementation of above but to a program. E.g.
+ Device = "|program" (not full pipeline).
+ Reason: routines already exist, and program can
+ be a shell script which contains anything.
-- Bug: anonymous Volumes requires mount in some cases.
-- see setgroup and user for Bacula p4-5 of stunnel.c
-- Implement new serialize subroutines
- send(socket, "string", &Vol, "uint32", &i, NULL)
-- Add save type to Session label.
-- Correct date on Session label.
-- On I/O error, write EOF, then try to write again.
-- Audit all UA commands to ensure that we always prompt where
- possible.
-- If ./btape is called without /dev, assume argument is
- a Storage resource name.
-- Put memory utilization in Status output of each daemon
- if full status requested or if some level of debug on.
-- Make database type selectable by .conf files i.e. at runtime
-- gethostbyname failure in bnet_connect() continues
- generating errors -- should stop.
-- Don't create a volume that is already written. I.e. create only once.
-- If error at end of tape, implement some way to kill waiting processes.
-- Add HOST to Volume label.
-- Set flag for uname -a. Add to Volume label.
-- Implement throttled work queue.
-- Write bscan program that will syncronize the DB Media record with
- the contents of the Volume -- for use after a crash.
-- Check for EOT at ENOSPC or EIO or ENXIO (unix Pc)
-- Allow multiple Storage specifications (or multiple names on
- a single Storage specification) in the Job record. Thus a job
- can be backed up to a number of storage devices.
-- Implement full MediaLabel code.
-- Implement dump label to UA
-- Copy volume using single drive.
-- Copy volume with multiple driven (same or different block size).
-- Add block size (min, max) to Vol label.
-- Concept of VolumeSet during restore which is a list
- of Volume names needed.
-- Restore files modified after date
-- Restore file modified before date
-- Emergency restore info:
- - Backup Bacula
- - Backup working directory
- - Backup Catalog
-- Restore options (do not overwrite)
-- Restore -- do nothing but show what would happen
-- SET LD_RUN_PATH=$HOME/mysql/lib/mysql
-- Put Job statistics in End Session Label (files saved,
- total bytes, start time, ...).
-- Put FileSet name in the SOS label.
-- Implement Restore FileSet=
-- Write a scanner for the UA (keyword, scan-routine, result, prompt).
-- Create a protocol.h and protocol.c where all protocol messages
- are concentrated.
-- If SD cannot open a drive, make it periodically retry.
-- Put Bacula version somewhere in Job stream, probably Start Session
- Labels.
-- Remove duplicate fields from jcr (e.g. jcr.level and
- jcr.jr.Level, ...).
-- Timout a job or terminate if link goes down, or reopen link and query.
-- Define how we handle times to avoid problem with Unix dates (2049 ?).
-- Fill all fields in Vol/Job Header -- ensure that everything
- needed is written to tape. Think about restore to Catalog
- from tape. Client record needs improving.
-- Find general solution for sscanf size problems (as well
- as sprintf. Do at run time?
+6. Add full pipeline as a possibility. E.g.
+ Device = "| gzip -c | rsh hsm dd of=path/file.tgz"
+ Reason: needs additional coding to implement full
+ pipeline (must fire off either a shell or all
+ programs and connect their pipes).
-- Concept of precious tapes (cannot be reused).
-- Allow FD to run from inetd ???
-- Preprocessing command per file.
-- Postprocessing command per file (when restoring).
-
-- Restore should get Device and Pool information from
- job record rather than from config.
-- Autolabel should be specified by DR instead of SD.
-- Ability to recreate the catalog from a tape.
-- Find out how to get the system tape block limits, e.g.:
- Apr 22 21:22:10 polymatou kernel: st1: Block limits 1 - 245760 bytes.
- Apr 22 21:22:10 polymatou kernel: st0: Block limits 2 - 16777214 bytes.
-- Storage daemon
- - Add media capacity
- - AutoScan (check checksum of tape)
- - Format command = "format /dev/nst0"
- - MaxRewindTime
- - MinRewindTime
- - MaxBufferSize
- - Seek resolution (usually corresponds to buffer size)
- - EODErrorCode=ENOSPC or code
- - Partial Read error code
- - Partial write error code
- - Nonformatted read error
- - Nonformatted write error
- - WriteProtected error
- - IOTimeout
- - OpenRetries
- - OpenTimeout
- - IgnoreCloseErrors=yes
- - Tape=yes
- - NoRewind=yes
-- Pool
- - Maxwrites
- - Recycle period
-- Job
- - MaxWarnings
- - MaxErrors (job?)
-=====
-- Eliminate duplicate File records to shrink database.
-- FD sends unsaved file list to Director at end of job.
-- Write a Storage daemon that uses pipes and
- standard Unix programs to write to the tape.
- See afbackup.
-- Need something that monitors the JCR queue and
- times out jobs by asking the deamons where they are.
-- Add daemon JCR JobId=0 to have a daemon context
-- Pool resource
- - Auto label
- - Auto media verify
- - Client (list of clients to force client)
- - Devices (list of devices to force device)
- - enable/disable
- - Groups
- - Levels
- - Type: Backup, ...
- - Recycle from other pools: Yes, No
- - Recycle to other pools: Yes, no
- - FileSets
- - MaxBytes?
- - Optional MediaType to force media?
- - Maintain Catalog
- - Label Template
- - Retention Period
- ============
- - Name
- - NumVols
- - NaxVols
- - CurrentVol
+There are a good number of details in each step
+that I have left out, but I will specify them at
+every stage, and there may be a few changes as things
+evolve. I expect that to get to stage 5 will take a
+few weeks, and at that point, you will have
+everything you need (just inside a script).
+Stage 6 will probably take longer, but if this
+project pleases you, what we do for 5 should
+be adequate for some time.
-=====
- if(connect(sockfd, (struct sockaddr * ) (& addr), sizeof(addr)) .lt. 0){
- close(sockfd);
- return(-6);
- }
-
- linger.l_onoff = 1;
- linger.l_linger = 60;
- i = setsockopt(sockfd, SOL_SOCKET, SO_LINGER, (char *) &linger,
- sizeof (linger));
-
- fl = fcntl(sockfd, F_GETFL);
- fcntl(sockfd, F_SETFL, fl & (~ O_NONBLOCK) & (~ O_NDELAY));
-====
-- Enhance Jmsg code to permit buffering and saving to disk.
-- device driver = "xxxx" for drives.
-- restart: paranoid: read label fsf to
- eom read append block, and go
- super-paranoid: read label, read all files
- in between, read append block, and go
- verify: backspace, read append block, and go
- permissive: same as above but frees drive
- if tape is not valid.
-- Verify from Volume
-- Ensure that /dev/null works
-- File daemon should build list of files skipped, and then
- at end of save retry and report any errors.
-- Need report class for messages. Perhaps
- report resource where report=group of messages
-- Verify from Tape
-- enhance scan_attrib and rename scan_jobtype, and
- fill in code for "since" option
-- To buffer messages, we need associated jobid and Director name.
-- Need to save contents of FileSet to tape?
-- Director needs a time after which the report status is sent
- anyway -- or better yet, a retry time for the job.
- Don't reschedule a job if previous incarnation is still running.
-- Figure out how to save the catalog (possibly a special FileSet).
-- Figure out how to restore the catalog.
-- Figure out how to put a Volume into the catalog (from the tape)
-- Figure out how to do a restore from a Volume
-- Some way to automatically backup everything is needed????
-- Need a structure for pending actions:
- - buffered messages
- - termination status (part of buffered msgs?)
-- Concept of grouping Storage devices and job can use
- any of a number of devices
-- Drive management
- Read, Write, Clean, Delete
-- Login to Bacula; Bacula users with different permissions:
- owner, group, user
-- Tape recycle destination
-- Job Schedule Status
- - Automatic
- - Manual
- - Running
-- File daemon should pass Director the operating system info
- to be stored in the Client Record (or verified that it has
- not changed).
-- Store info on each file system type (probably in the job header on tape.
- This could be the output of df; or perhaps some sort of /etc/mtab record.
-Longer term to do:
-- Use media 1 time (so that we can do 6 days of incremental
- backups before switching to another tape) (already)
- specify # times (jobs)
- specify bytes (already)
- specify time (seconds, hours, days)
-- Implement FSM (File System Modules).
-- Identify unchanged or "system" files and save them to a
- special tape thus removing them from the standard
- backup FileSet -- BASE backup.
-- Turn virutally all sprintfs into snprintfs.
-- Heartbeat between daemons.
-- Audit M_ error codes to ensure they are correct and consistent.
-- Add variable break characters to lex analyzer.
- Either a bit mask or a string of chars so that
- the caller can change the break characters.
-- Make a single T_BREAK to replace T_COMMA, etc.
-- Ensure that File daemon and Storage daemon can
- continue a save if the Director goes down (this
- is NOT currently the case). Must detect socket error,
- buffer messages for later.
+=============================================
Done: (see kernsdone for more)
---the console script is broken as installed and has to be hand-massaged with
- paths, config files etc.
-- Termination status in FD for Verify = C -- incorrect.
-- Implement alter_sqlite_tables
-- Fix scheduler -- see "Hourly cycle". It doesn't do both each
- hour, rather it alternates between 0:05 and 0:35.
-- Create Counter DB records.
-- Fix db_get_job_volume_names() to return array of strings (now works
- with pool memory.
-- Eliminate MySQL shared libraries from smtp and daemons not using MySQL.
-- Compare tape File attributes to Catalog.
- (File attributes are size, dates, MD5, but not
- data).
-- Report bad status from smtp or mail program.
-- Ensure that Start/End File/Block are correct.
-- If MySQL database is not running, job terminates with
- wierd type and wierd error code.
-- Probably create a jcr with JobId=0 as a master
- catchall if jcr not found or if operation involves
- global operation.
-- The daemons should know when one is already
- running and refuse to run a second copy.
-- Figure out how to do a "full" restore from catalog
-- Make SD send attribute stream to DR but first
- buffering to file, then sending only when the
- files are written to tape.
-- Restore file xx or files xx, yy to their most recent values.
-- Get correct block/file information in Catalog, pay attention to change of media.
-- Write better dump of Messages resource.
-- Authentication between SD and FD
-- In restore job, print some summary information at end, such
- as rate, ... job status, ...
-- Problem with len at 362 in tree.c
-- Report volume write rate.
-- Pass "Catalog Files = no" to storage daemon to eliminate network traffic.
-- When we are at EOM, we must ask each job to write JobMedia
- record (update_volume_info).
+- Look into Pruning/purging problems or why there seem to
+ be so many files listed each night.
+- Fix cancel in find_one -- need jcr.
+- Cancel does not work for restore in FD.
+- Write SetJobStatus() function so cancel status not lost.
+- Add include list to end of chain in findlib
+- Zap sd_auth_key after use
+- Add Bar code reading capabilities (new mtx-changer)
+- Figure out some way to automatically backup all local partitions
+- Make hash table for linked files in findlib/find_one.c:161
+ (not necessary)
+- Rewrite find_one.c to use only pool_memory instead of
+ alloca and malloc (probably not necessary).
+- Make sure btraceback goes into /sbin not sysconf directory.
+- InitVerify is getting pruned and it shouldn't (document it)
+- Make 1.28c release ??? NO do 1.29 directly
+- Set timeout on opening fifo for save or restore (findlib)
+- Document FIFO storage device.
+- Document fifo and | and <
+====== 1.30 =======
+- Implement SHA1
+- Get correct error status from run_program or open_bpipe().
+