1 From: "Dan Langille" <dan@langille.org>
2 To: bacula-users@lists.sourceforge.net
3 Subject: [Bacula-users] FreeBSD - large backups to tape
4 Date: Mon, 20 Oct 2003 15:29:18 -0400
6 Kern and I have been working on a FreeBSD/Bacula problem.
7 He's asked me to post this to the list. The problem was within the
8 FreeBSD pthreads library. A solution has been found.
12 The FreeBSD pthreads library does not properly handle End Of Tape.
13 This problem will be fixed in FreeBSD 4.9.
15 UPDATE 2004/02/24: Note, the problem was apparently not fixed in
16 4.9-RELEASE. 4.9-RELEASE contained a partial patch that did not
17 prevent data loss. To date, the latest FreeBSD -RELEASE versions
18 (4.9-RELEASE and 5.2.1-RELEASE) are *broken* as shipped. If
19 you are running one of these systems, please either patch
20 your system as described below or upgrade to -STABLE or
23 We expect 4.10-RELEASE to be available within a few weeks
24 (written 24 Apr 2004) and 5.3-RELEASE to be available in a
25 few months. 4.10 and 5.3 *should* contain the fix, but we
26 can't know for sure until we've had a chance to test them.
28 The bug results in more data being written than the tape will
29 hold because of a lost status code. Any backup which involving
30 more than one tape would have data lost.
34 To demonstrate the problem, tapetest.c can be obtained from
35 http://www.freebsd.org/cgi/query-pr.cgi?pr=56274
37 tapetest.c can also be found in the Bacula source distribution
38 in <bacula-source>/platforms/freebsd/tapetest.c
40 This tests without pthreads:
42 * If you build this program with:
44 * c++ -g -O2 -Wall -c tapetest.c
45 * c++ -g -O2 -Wall tapetest.o -o tapetest
47 * Procedure for testing tape
48 * ./tapetest /dev/your-tape-device
54 * The output will be something like:
58 * *Begin writing blocks of 64512 bytes.
59 * ++++++++++++++++++++ ...
60 * Write failed. Last block written=17294. stat=0 ERR=Unknown error: 0
62 * Wrote EOF to /dev/nsa0
64 * *Starting scan at file 0
65 * 17294 blocks of 64512 bytes in file 0
69 * Total files=1, blocks=17294, bytes = 1115670528
72 * which is correct. Notice that the return status is
73 * 0, while in the example below, which fails, the return
76 This tests with pthreads:
78 * If you build this program with:
80 * c++ -g -O2 -Wall -pthread -c tapetest.c
81 * c++ -g -O2 -Wall -pthread tapetest.o -o tapetest
82 * Note, we simply added -pthread compared to the
85 * Procedure for testing tape
86 * ./tapetest /dev/your-tape-device
92 * The output will be something like:
96 * *Begin writing blocks of 64512 bytes.
97 * +++++++++++++++++++++++++++++ ...
98 * Write failed. Last block written=17926. stat=-1 ERR=No space left on device
100 * Wrote EOF to /dev/nsa0
102 * *Starting scan at file 0
103 * 17913 blocks of 64512 bytes in file 0
107 * Total files=1, blocks=17913, bytes = 1155603456
110 * which is incorrect because it wrote 17,926 blocks and the
111 * status on the last block written is stat=-1, which is incorrect.
112 * In addition only 17,913 blocks were read back.
114 * Similarly, if you ran this test on 4.9-RELEASE or 5.2.1-RELEASE
115 * (these versions contain an incomplete patch) then you would
116 * probably see something like this:
120 * *Begin writing blocks of 64512 bytes.
121 * +++++++++++++++ [...]
123 * Wrote EOF to /dev/nsa0
124 * Write failed. Last block written=271163. stat=-1 ERR=No space left on device
126 * *Starting scan at file 0
127 * Bad status from read -1. ERR=Input/output error
128 * 271163 blocks of 64512 bytes in file 0
131 * The above output is also incorrect. The block counts match,
132 * but note the -1 error code on the read and write. This is
133 * just as dangerous as the first example. If you see this
134 * output then you should patch or upgrade to -STABLE or -CURRENT
137 If you get the same number of blocks written and read WHEN using
138 pthreads, AND the test with pthreads enabled returns a stat=0
139 on the last write, and the scan operation returns no error
140 code, then you've been correctly patched. It is important that
141 stat=0 rather than -1 even if the correct number of blocks
142 are read back. If the status is -1 on the pthreads test, you
147 For FreeBSD versions prior to 4.10-RELEASE and 5.3-RELEASE you
148 have two choices to ensure proper backups. These instructions
149 assume you are familiar with patching FreeBSD and already have
150 the FreeBSD source code installed on your machine.
154 Do one of the following:
156 - cvsup and build your system to FreeBSD 4.x-STABLE after the
157 date Mon Dec 29 15:18:01 2003 UTC
161 http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc_r/uthread/uthread_write.c.diff?r1=1.16.2.6&r2=1.16.2.8
163 To apply the patch, follow these instructions as root.
165 cd /usr/src/lib/libc_r/uthread/
166 fetch -o pthread.diff 'http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc_r/uthread/uthread_write.c.diff?r1=1.16.2.6&r2=1.16.2.8'
173 Do one of the following:
175 - cvsup and build your system to FreeBSD -CURRENT after the
176 date Wed Dec 17 16:44:03 2003 UTC
180 http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc_r/uthread/uthread_write.c.diff?r1=1.22&r2=1.23
182 Wed Dec 17 16:44:03 2003 UTC
184 To apply the patch, follow these instructions as root.
186 cd /usr/src/lib/libc_r/uthread/
187 fetch -o pthread.diff 'http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc_r/uthread/uthread_write.c.diff?r1=1.22&r2=1.23'
192 After patching your system as shown above,
193 you should then recompile Bacula to get the new library
194 code included by doing:
204 I suggest running tapetest on your patched system and then
205 conducting a backup which spans two tapes. Restore the data
206 and compare to the original. If not identical, please let us know.