-==== Keeping track of deleted/new files ====
-- To mark files as deleted, run essentially a Verify to disk, and
- when a file is found missing (MarkId != JobId), then create
- a new File record with FileIndex == -1. This could be done
- by the FD at the same time as the backup.
-
- My "trick" for keeping track of deletions is the following.
- Assuming the user turns on this option, after all the files
- have been backed up, but before the job has terminated, the
- FD will make a pass through all the files and send their
- names to the DIR (*exactly* the same as what a Verify job
- currently does). This will probably be done at the same
- time the files are being sent to the SD avoiding a second
- pass. The DIR will then compare that to what is stored in
- the catalog. Any files in the catalog but not in what the
- FD sent will receive a catalog File entry that indicates
- that at that point in time the file was deleted. This
- either transmitted to the FD or simultaneously computed in
- the FD, so that the FD can put a record on the tape that
- indicates that the file has been deleted at this point.
- A delete file entry could potentially be one with a FileIndex
- of 0 or perhaps -1 (need to check if FileIndex is used for
- some other thing as many of the Bacula fields are "overloaded"
- in the SD).
-
- During a restore, any file initially picked up by some
- backup (Full, ...) then subsequently having a File entry
- marked "delete" will be removed from the tree, so will not
- be restored. If a file with the same name is later OK it
- will be inserted in the tree -- this already happens. All
- will be consistent except for possible changes during the
- running of the FD.
-
- Since I'm on the subject, some of you may be wondering what
- the utility of the in memory tree is if you are going to
- restore everything (at least it comes up from time to time
- on the list). Well, it is still *very* useful because it
- allows only the last item found for a particular filename
- (full path) to be entered into the tree, and thus if a file
- is backed up 10 times, only the last copy will be restored.
- I recently (last Friday) restored a complete directory, and
- the Full and all the Differential and Incremental backups
- spanned 3 Volumes. The first Volume was not even mounted
- because all the files had been updated and hence backed up
- since the Full backup was made. In this case, the tree
- saved me a *lot* of time.
-
- Make sure this information is stored on the tape too so
- that it can be restored directly from the tape.
-
- All the code (with the exception of formally generating and
- saving the delete file entries) already exists in the Verify
- Catalog command. It explicitly recognizes added/deleted files since
- the last InitCatalog. It is more or less a "simple" matter of
- taking that code and adapting it slightly to work for backups.
-
- Comments from Martin Simmons (I think they are all covered):
- Ok, that should cover the basics. There are few issues though:
-
- - Restore will depend on the catalog. I think it is better to include the
- extra data in the backup as well, so it can be seen by bscan and bextract.
-
- - I'm not sure if it will preserve multiple hard links to the same inode. Or
- maybe adding or removing links will cause the data to be dumped again?
-
- - I'm not sure if it will handle renamed directories. Possibly it will work
- by dumping the whole tree under a renamed directory?
-
- - It remains to be seen how the backup performance of the DIR's will be
- affected when comparing the catalog for a large filesystem.
-
- 1. Use the current Director in-memory tree code (very fast), but currently in
- memory. It probably could be paged.
-
- 2. Use some DB such as Berkeley DB or SQLite. SQLite is already compiled and
- built for Win32, and it is something we could compile into the program.
-
- 3. Implement our own custom DB code.
-
- Note, by appropriate use of Directives in the Director, we can dynamically
- decide if the work is done in the Director or in the FD, and we can even
- allow the user to choose.
-
-=== most recent accurate file backup/restore ===
- Here is a sketch (i.e. more details must be filled in later) that I recently
- made of an algorithm for doing Accurate Backup.
-
- 1. Dir informs FD that it is doing an Accurate backup and lookup done by
- Director.
-
- 2. FD passes through the file system doing a normal backup based on normal
- conditions, recording the names of all files and their attributes, and
- indicating which files were backed up. This is very similar to what Verify
- does.
-
- 3. The Director receives the two lists of files at the end of the FD backup.
- One, files backed up, and one files not backed up. It then looks up all the
- files not backed up (using Verify style code).
-
- 4. The Dir sends the FD a list of:
- a. Additional files to backup (based on user specified criteria, name, size
- inode date, hash, ...).
- b. Files to delete.
-
- 5. Dir deletes list of file not backed up.
-
- 6. FD backs up additional files generates a list of those backed up and sends
- it to the Director, which adds it to the list of files backed up. The list
- is now complete and current.
-
- 7. The FD generates delete records for all the files that were deleted and
- sends to the SD.
-
- 8. The Dir deletes the previous CurrentBackup list, and then does a
- transaction insert of the new list that it has.
-
- 9. The rest works as before ...
-
- That is it.
-
- Two new tables needed.
- 1. CurrentBackupId table that contains Client, JobName, FileSet, and a unique
- BackupId. This is created during a Full save, and the BackupId can be set to
- the JobId of the Full save. It will remain the same until another Full
- backup is done. That is when new records are added during a Differential or
- Incremental, they must use the same BackupId.
-
- 2. CurrentBackup table that contains essentially a File record (less a number
- of fields, but with a few extra fields) -- e.g. a flag that the File was
- backed up by a Full save (this permits doing a Differential). The unique
- BackupId allows us to look up the CurrentBackup for a particular Client,
- Jobname, FileSet using that unique BackupId as the key, so this table must be
- indexed by the BackupId.
-
- Note any time a file is saved by the FD other than during a Full save, the
- Full save flag is cleared. When doing a Differential backup, if a file has
- the Full save flag set, it is skipped, otherwise it is backed up. For an
- Incremental backup, we check to see if the file has changed since the last
- time we backed it up.
-
- Deleted files should have FileIndex == 0