]>
Commit | Line | Data |
---|---|---|
d967e3ec | 1 | diff -Nru bzip2-1.0.1/AUTHORS bzip2-1.0.1.new/AUTHORS |
2 | --- bzip2-1.0.1/AUTHORS Thu Jan 1 01:00:00 1970 | |
3 | +++ bzip2-1.0.1.new/AUTHORS Sat Jun 24 20:13:05 2000 | |
4 | @@ -0,0 +1 @@ | |
5 | +Julian Seward <jseward@acm.org> | |
6 | diff -Nru bzip2-1.0.1/CHANGES bzip2-1.0.1.new/CHANGES | |
7 | --- bzip2-1.0.1/CHANGES Sat Jun 24 20:13:27 2000 | |
8 | +++ bzip2-1.0.1.new/CHANGES Thu Jan 1 01:00:00 1970 | |
9 | @@ -1,167 +0,0 @@ | |
10 | - | |
11 | - | |
12 | -0.9.0 | |
13 | -~~~~~ | |
14 | -First version. | |
15 | - | |
16 | - | |
17 | -0.9.0a | |
18 | -~~~~~~ | |
19 | -Removed 'ranlib' from Makefile, since most modern Unix-es | |
20 | -don't need it, or even know about it. | |
21 | - | |
22 | - | |
23 | -0.9.0b | |
24 | -~~~~~~ | |
25 | -Fixed a problem with error reporting in bzip2.c. This does not effect | |
26 | -the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the | |
27 | -program proper) compress and decompress correctly, but give misleading | |
28 | -error messages (internal panics) when an I/O error occurs, instead of | |
29 | -reporting the problem correctly. This shouldn't give any data loss | |
30 | -(as far as I can see), but is confusing. | |
31 | - | |
32 | -Made the inline declarations disappear for non-GCC compilers. | |
33 | - | |
34 | - | |
35 | -0.9.0c | |
36 | -~~~~~~ | |
37 | -Fixed some problems in the library pertaining to some boundary cases. | |
38 | -This makes the library behave more correctly in those situations. The | |
39 | -fixes apply only to features (calls and parameters) not used by | |
40 | -bzip2.c, so the non-fixedness of them in previous versions has no | |
41 | -effect on reliability of bzip2.c. | |
42 | - | |
43 | -In bzlib.c: | |
44 | - * made zero-length BZ_FLUSH work correctly in bzCompress(). | |
45 | - * fixed bzWrite/bzRead to ignore zero-length requests. | |
46 | - * fixed bzread to correctly handle read requests after EOF. | |
47 | - * wrong parameter order in call to bzDecompressInit in | |
48 | - bzBuffToBuffDecompress. Fixed. | |
49 | - | |
50 | -In compress.c: | |
51 | - * changed setting of nGroups in sendMTFValues() so as to | |
52 | - do a bit better on small files. This _does_ effect | |
53 | - bzip2.c. | |
54 | - | |
55 | - | |
56 | -0.9.5a | |
57 | -~~~~~~ | |
58 | -Major change: add a fallback sorting algorithm (blocksort.c) | |
59 | -to give reasonable behaviour even for very repetitive inputs. | |
60 | -Nuked --repetitive-best and --repetitive-fast since they are | |
61 | -no longer useful. | |
62 | - | |
63 | -Minor changes: mostly a whole bunch of small changes/ | |
64 | -bugfixes in the driver (bzip2.c). Changes pertaining to the | |
65 | -user interface are: | |
66 | - | |
67 | - allow decompression of symlink'd files to stdout | |
68 | - decompress/test files even without .bz2 extension | |
69 | - give more accurate error messages for I/O errors | |
70 | - when compressing/decompressing to stdout, don't catch control-C | |
71 | - read flags from BZIP2 and BZIP environment variables | |
72 | - decline to break hard links to a file unless forced with -f | |
73 | - allow -c flag even with no filenames | |
74 | - preserve file ownerships as far as possible | |
75 | - make -s -1 give the expected block size (100k) | |
76 | - add a flag -q --quiet to suppress nonessential warnings | |
77 | - stop decoding flags after --, so files beginning in - can be handled | |
78 | - resolved inconsistent naming: bzcat or bz2cat ? | |
79 | - bzip2 --help now returns 0 | |
80 | - | |
81 | -Programming-level changes are: | |
82 | - | |
83 | - fixed syntax error in GET_LL4 for Borland C++ 5.02 | |
84 | - let bzBuffToBuffDecompress return BZ_DATA_ERROR{_MAGIC} | |
85 | - fix overshoot of mode-string end in bzopen_or_bzdopen | |
86 | - wrapped bzlib.h in #ifdef __cplusplus ... extern "C" { ... } | |
87 | - close file handles under all error conditions | |
88 | - added minor mods so it compiles with DJGPP out of the box | |
89 | - fixed Makefile so it doesn't give problems with BSD make | |
90 | - fix uninitialised memory reads in dlltest.c | |
91 | - | |
92 | -0.9.5b | |
93 | -~~~~~~ | |
94 | -Open stdin/stdout in binary mode for DJGPP. | |
95 | - | |
96 | -0.9.5c | |
97 | -~~~~~~ | |
98 | -Changed BZ_N_OVERSHOOT to be ... + 2 instead of ... + 1. The + 1 | |
99 | -version could cause the sorted order to be wrong in some extremely | |
100 | -obscure cases. Also changed setting of quadrant in blocksort.c. | |
101 | - | |
102 | -0.9.5d | |
103 | -~~~~~~ | |
104 | -The only functional change is to make bzlibVersion() in the library | |
105 | -return the correct string. This has no effect whatsoever on the | |
106 | -functioning of the bzip2 program or library. Added a couple of casts | |
107 | -so the library compiles without warnings at level 3 in MS Visual | |
108 | -Studio 6.0. Included a Y2K statement in the file Y2K_INFO. All other | |
109 | -changes are minor documentation changes. | |
110 | - | |
111 | -1.0 | |
112 | -~~~ | |
113 | -Several minor bugfixes and enhancements: | |
114 | - | |
115 | -* Large file support. The library uses 64-bit counters to | |
116 | - count the volume of data passing through it. bzip2.c | |
117 | - is now compiled with -D_FILE_OFFSET_BITS=64 to get large | |
118 | - file support from the C library. -v correctly prints out | |
119 | - file sizes greater than 4 gigabytes. All these changes have | |
120 | - been made without assuming a 64-bit platform or a C compiler | |
121 | - which supports 64-bit ints, so, except for the C library | |
122 | - aspect, they are fully portable. | |
123 | - | |
124 | -* Decompression robustness. The library/program should be | |
125 | - robust to any corruption of compressed data, detecting and | |
126 | - handling _all_ corruption, instead of merely relying on | |
127 | - the CRCs. What this means is that the program should | |
128 | - never crash, given corrupted data, and the library should | |
129 | - always return BZ_DATA_ERROR. | |
130 | - | |
131 | -* Fixed an obscure race-condition bug only ever observed on | |
132 | - Solaris, in which, if you were very unlucky and issued | |
133 | - control-C at exactly the wrong time, both input and output | |
134 | - files would be deleted. | |
135 | - | |
136 | -* Don't run out of file handles on test/decompression when | |
137 | - large numbers of files have invalid magic numbers. | |
138 | - | |
139 | -* Avoid library namespace pollution. Prefix all exported | |
140 | - symbols with BZ2_. | |
141 | - | |
142 | -* Minor sorting enhancements from my DCC2000 paper. | |
143 | - | |
144 | -* Advance the version number to 1.0, so as to counteract the | |
145 | - (false-in-this-case) impression some people have that programs | |
146 | - with version numbers less than 1.0 are in someway, experimental, | |
147 | - pre-release versions. | |
148 | - | |
149 | -* Create an initial Makefile-libbz2_so to build a shared library. | |
150 | - Yes, I know I should really use libtool et al ... | |
151 | - | |
152 | -* Make the program exit with 2 instead of 0 when decompression | |
153 | - fails due to a bad magic number (ie, an invalid bzip2 header). | |
154 | - Also exit with 1 (as the manual claims :-) whenever a diagnostic | |
155 | - message would have been printed AND the corresponding operation | |
156 | - is aborted, for example | |
157 | - bzip2: Output file xx already exists. | |
158 | - When a diagnostic message is printed but the operation is not | |
159 | - aborted, for example | |
160 | - bzip2: Can't guess original name for wurble -- using wurble.out | |
161 | - then the exit value 0 is returned, unless some other problem is | |
162 | - also detected. | |
163 | - | |
164 | - I think it corresponds more closely to what the manual claims now. | |
165 | - | |
166 | - | |
167 | -1.0.1 | |
168 | -~~~~~ | |
169 | -* Modified dlltest.c so it uses the new BZ2_ naming scheme. | |
170 | -* Modified makefile-msc to fix minor build probs on Win2k. | |
171 | -* Updated README.COMPILATION.PROBLEMS. | |
172 | - | |
173 | -There are no functionality changes or bug fixes relative to version | |
174 | -1.0.0. This is just a documentation update + a fix for minor Win32 | |
175 | -build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is | |
176 | -utterly pointless. Don't bother. | |
177 | diff -Nru bzip2-1.0.1/COPYING bzip2-1.0.1.new/COPYING | |
178 | --- bzip2-1.0.1/COPYING Thu Jan 1 01:00:00 1970 | |
179 | +++ bzip2-1.0.1.new/COPYING Sat Jun 24 20:13:05 2000 | |
180 | @@ -0,0 +1,39 @@ | |
181 | + | |
182 | +This program, "bzip2" and associated library "libbzip2", are | |
183 | +copyright (C) 1996-2000 Julian R Seward. All rights reserved. | |
184 | + | |
185 | +Redistribution and use in source and binary forms, with or without | |
186 | +modification, are permitted provided that the following conditions | |
187 | +are met: | |
188 | + | |
189 | +1. Redistributions of source code must retain the above copyright | |
190 | + notice, this list of conditions and the following disclaimer. | |
191 | + | |
192 | +2. The origin of this software must not be misrepresented; you must | |
193 | + not claim that you wrote the original software. If you use this | |
194 | + software in a product, an acknowledgment in the product | |
195 | + documentation would be appreciated but is not required. | |
196 | + | |
197 | +3. Altered source versions must be plainly marked as such, and must | |
198 | + not be misrepresented as being the original software. | |
199 | + | |
200 | +4. The name of the author may not be used to endorse or promote | |
201 | + products derived from this software without specific prior written | |
202 | + permission. | |
203 | + | |
204 | +THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | |
205 | +OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
206 | +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
207 | +ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | |
208 | +DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
209 | +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | |
210 | +GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
211 | +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | |
212 | +WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | |
213 | +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | |
214 | +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
215 | + | |
216 | +Julian Seward, Cambridge, UK. | |
217 | +jseward@acm.org | |
218 | +bzip2/libbzip2 version 1.0 of 21 March 2000 | |
219 | + | |
220 | diff -Nru bzip2-1.0.1/ChangeLog bzip2-1.0.1.new/ChangeLog | |
221 | --- bzip2-1.0.1/ChangeLog Thu Jan 1 01:00:00 1970 | |
222 | +++ bzip2-1.0.1.new/ChangeLog Sat Jun 24 20:13:05 2000 | |
223 | @@ -0,0 +1 @@ | |
224 | + | |
225 | diff -Nru bzip2-1.0.1/INSTALL bzip2-1.0.1.new/INSTALL | |
226 | --- bzip2-1.0.1/INSTALL Thu Jan 1 01:00:00 1970 | |
227 | +++ bzip2-1.0.1.new/INSTALL Sat Jun 24 20:13:06 2000 | |
228 | @@ -0,0 +1,182 @@ | |
229 | +Basic Installation | |
230 | +================== | |
231 | + | |
232 | + These are generic installation instructions. | |
233 | + | |
234 | + The `configure' shell script attempts to guess correct values for | |
235 | +various system-dependent variables used during compilation. It uses | |
236 | +those values to create a `Makefile' in each directory of the package. | |
237 | +It may also create one or more `.h' files containing system-dependent | |
238 | +definitions. Finally, it creates a shell script `config.status' that | |
239 | +you can run in the future to recreate the current configuration, a file | |
240 | +`config.cache' that saves the results of its tests to speed up | |
241 | +reconfiguring, and a file `config.log' containing compiler output | |
242 | +(useful mainly for debugging `configure'). | |
243 | + | |
244 | + If you need to do unusual things to compile the package, please try | |
245 | +to figure out how `configure' could check whether to do them, and mail | |
246 | +diffs or instructions to the address given in the `README' so they can | |
247 | +be considered for the next release. If at some point `config.cache' | |
248 | +contains results you don't want to keep, you may remove or edit it. | |
249 | + | |
250 | + The file `configure.in' is used to create `configure' by a program | |
251 | +called `autoconf'. You only need `configure.in' if you want to change | |
252 | +it or regenerate `configure' using a newer version of `autoconf'. | |
253 | + | |
254 | +The simplest way to compile this package is: | |
255 | + | |
256 | + 1. `cd' to the directory containing the package's source code and type | |
257 | + `./configure' to configure the package for your system. If you're | |
258 | + using `csh' on an old version of System V, you might need to type | |
259 | + `sh ./configure' instead to prevent `csh' from trying to execute | |
260 | + `configure' itself. | |
261 | + | |
262 | + Running `configure' takes awhile. While running, it prints some | |
263 | + messages telling which features it is checking for. | |
264 | + | |
265 | + 2. Type `make' to compile the package. | |
266 | + | |
267 | + 3. Optionally, type `make check' to run any self-tests that come with | |
268 | + the package. | |
269 | + | |
270 | + 4. Type `make install' to install the programs and any data files and | |
271 | + documentation. | |
272 | + | |
273 | + 5. You can remove the program binaries and object files from the | |
274 | + source code directory by typing `make clean'. To also remove the | |
275 | + files that `configure' created (so you can compile the package for | |
276 | + a different kind of computer), type `make distclean'. There is | |
277 | + also a `make maintainer-clean' target, but that is intended mainly | |
278 | + for the package's developers. If you use it, you may have to get | |
279 | + all sorts of other programs in order to regenerate files that came | |
280 | + with the distribution. | |
281 | + | |
282 | +Compilers and Options | |
283 | +===================== | |
284 | + | |
285 | + Some systems require unusual options for compilation or linking that | |
286 | +the `configure' script does not know about. You can give `configure' | |
287 | +initial values for variables by setting them in the environment. Using | |
288 | +a Bourne-compatible shell, you can do that on the command line like | |
289 | +this: | |
290 | + CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure | |
291 | + | |
292 | +Or on systems that have the `env' program, you can do it like this: | |
293 | + env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure | |
294 | + | |
295 | +Compiling For Multiple Architectures | |
296 | +==================================== | |
297 | + | |
298 | + You can compile the package for more than one kind of computer at the | |
299 | +same time, by placing the object files for each architecture in their | |
300 | +own directory. To do this, you must use a version of `make' that | |
301 | +supports the `VPATH' variable, such as GNU `make'. `cd' to the | |
302 | +directory where you want the object files and executables to go and run | |
303 | +the `configure' script. `configure' automatically checks for the | |
304 | +source code in the directory that `configure' is in and in `..'. | |
305 | + | |
306 | + If you have to use a `make' that does not supports the `VPATH' | |
307 | +variable, you have to compile the package for one architecture at a time | |
308 | +in the source code directory. After you have installed the package for | |
309 | +one architecture, use `make distclean' before reconfiguring for another | |
310 | +architecture. | |
311 | + | |
312 | +Installation Names | |
313 | +================== | |
314 | + | |
315 | + By default, `make install' will install the package's files in | |
316 | +`/usr/local/bin', `/usr/local/man', etc. You can specify an | |
317 | +installation prefix other than `/usr/local' by giving `configure' the | |
318 | +option `--prefix=PATH'. | |
319 | + | |
320 | + You can specify separate installation prefixes for | |
321 | +architecture-specific files and architecture-independent files. If you | |
322 | +give `configure' the option `--exec-prefix=PATH', the package will use | |
323 | +PATH as the prefix for installing programs and libraries. | |
324 | +Documentation and other data files will still use the regular prefix. | |
325 | + | |
326 | + In addition, if you use an unusual directory layout you can give | |
327 | +options like `--bindir=PATH' to specify different values for particular | |
328 | +kinds of files. Run `configure --help' for a list of the directories | |
329 | +you can set and what kinds of files go in them. | |
330 | + | |
331 | + If the package supports it, you can cause programs to be installed | |
332 | +with an extra prefix or suffix on their names by giving `configure' the | |
333 | +option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'. | |
334 | + | |
335 | +Optional Features | |
336 | +================= | |
337 | + | |
338 | + Some packages pay attention to `--enable-FEATURE' options to | |
339 | +`configure', where FEATURE indicates an optional part of the package. | |
340 | +They may also pay attention to `--with-PACKAGE' options, where PACKAGE | |
341 | +is something like `gnu-as' or `x' (for the X Window System). The | |
342 | +`README' should mention any `--enable-' and `--with-' options that the | |
343 | +package recognizes. | |
344 | + | |
345 | + For packages that use the X Window System, `configure' can usually | |
346 | +find the X include and library files automatically, but if it doesn't, | |
347 | +you can use the `configure' options `--x-includes=DIR' and | |
348 | +`--x-libraries=DIR' to specify their locations. | |
349 | + | |
350 | +Specifying the System Type | |
351 | +========================== | |
352 | + | |
353 | + There may be some features `configure' can not figure out | |
354 | +automatically, but needs to determine by the type of host the package | |
355 | +will run on. Usually `configure' can figure that out, but if it prints | |
356 | +a message saying it can not guess the host type, give it the | |
357 | +`--host=TYPE' option. TYPE can either be a short name for the system | |
358 | +type, such as `sun4', or a canonical name with three fields: | |
359 | + CPU-COMPANY-SYSTEM | |
360 | + | |
361 | +See the file `config.sub' for the possible values of each field. If | |
362 | +`config.sub' isn't included in this package, then this package doesn't | |
363 | +need to know the host type. | |
364 | + | |
365 | + If you are building compiler tools for cross-compiling, you can also | |
366 | +use the `--target=TYPE' option to select the type of system they will | |
367 | +produce code for and the `--build=TYPE' option to select the type of | |
368 | +system on which you are compiling the package. | |
369 | + | |
370 | +Sharing Defaults | |
371 | +================ | |
372 | + | |
373 | + If you want to set default values for `configure' scripts to share, | |
374 | +you can create a site shell script called `config.site' that gives | |
375 | +default values for variables like `CC', `cache_file', and `prefix'. | |
376 | +`configure' looks for `PREFIX/share/config.site' if it exists, then | |
377 | +`PREFIX/etc/config.site' if it exists. Or, you can set the | |
378 | +`CONFIG_SITE' environment variable to the location of the site script. | |
379 | +A warning: not all `configure' scripts look for a site script. | |
380 | + | |
381 | +Operation Controls | |
382 | +================== | |
383 | + | |
384 | + `configure' recognizes the following options to control how it | |
385 | +operates. | |
386 | + | |
387 | +`--cache-file=FILE' | |
388 | + Use and save the results of the tests in FILE instead of | |
389 | + `./config.cache'. Set FILE to `/dev/null' to disable caching, for | |
390 | + debugging `configure'. | |
391 | + | |
392 | +`--help' | |
393 | + Print a summary of the options to `configure', and exit. | |
394 | + | |
395 | +`--quiet' | |
396 | +`--silent' | |
397 | +`-q' | |
398 | + Do not print messages saying which checks are being made. To | |
399 | + suppress all normal output, redirect it to `/dev/null' (any error | |
400 | + messages will still be shown). | |
401 | + | |
402 | +`--srcdir=DIR' | |
403 | + Look for the package's source code in directory DIR. Usually | |
404 | + `configure' can determine that directory automatically. | |
405 | + | |
406 | +`--version' | |
407 | + Print the version of Autoconf used to generate the `configure' | |
408 | + script, and exit. | |
409 | + | |
410 | +`configure' also accepts some other, not widely useful, options. | |
411 | diff -Nru bzip2-1.0.1/LICENSE bzip2-1.0.1.new/LICENSE | |
412 | --- bzip2-1.0.1/LICENSE Sat Jun 24 20:13:27 2000 | |
413 | +++ bzip2-1.0.1.new/LICENSE Thu Jan 1 01:00:00 1970 | |
414 | @@ -1,39 +0,0 @@ | |
415 | - | |
416 | -This program, "bzip2" and associated library "libbzip2", are | |
417 | -copyright (C) 1996-2000 Julian R Seward. All rights reserved. | |
418 | - | |
419 | -Redistribution and use in source and binary forms, with or without | |
420 | -modification, are permitted provided that the following conditions | |
421 | -are met: | |
422 | - | |
423 | -1. Redistributions of source code must retain the above copyright | |
424 | - notice, this list of conditions and the following disclaimer. | |
425 | - | |
426 | -2. The origin of this software must not be misrepresented; you must | |
427 | - not claim that you wrote the original software. If you use this | |
428 | - software in a product, an acknowledgment in the product | |
429 | - documentation would be appreciated but is not required. | |
430 | - | |
431 | -3. Altered source versions must be plainly marked as such, and must | |
432 | - not be misrepresented as being the original software. | |
433 | - | |
434 | -4. The name of the author may not be used to endorse or promote | |
435 | - products derived from this software without specific prior written | |
436 | - permission. | |
437 | - | |
438 | -THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | |
439 | -OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
440 | -WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
441 | -ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | |
442 | -DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
443 | -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | |
444 | -GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
445 | -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | |
446 | -WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | |
447 | -NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | |
448 | -SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
449 | - | |
450 | -Julian Seward, Cambridge, UK. | |
451 | -jseward@acm.org | |
452 | -bzip2/libbzip2 version 1.0 of 21 March 2000 | |
453 | - | |
454 | diff -Nru bzip2-1.0.1/Makefile-libbz2_so bzip2-1.0.1.new/Makefile-libbz2_so | |
455 | --- bzip2-1.0.1/Makefile-libbz2_so Sat Jun 24 20:13:27 2000 | |
456 | +++ bzip2-1.0.1.new/Makefile-libbz2_so Thu Jan 1 01:00:00 1970 | |
457 | @@ -1,43 +0,0 @@ | |
458 | - | |
459 | -# This Makefile builds a shared version of the library, | |
460 | -# libbz2.so.1.0.1, with soname libbz2.so.1.0, | |
461 | -# at least on x86-Linux (RedHat 5.2), | |
462 | -# with gcc-2.7.2.3. Please see the README file for some | |
463 | -# important info about building the library like this. | |
464 | - | |
465 | -SHELL=/bin/sh | |
466 | -CC=gcc | |
467 | -BIGFILES=-D_FILE_OFFSET_BITS=64 | |
468 | -CFLAGS=-fpic -fPIC -Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES) | |
469 | - | |
470 | -OBJS= blocksort.o \ | |
471 | - huffman.o \ | |
472 | - crctable.o \ | |
473 | - randtable.o \ | |
474 | - compress.o \ | |
475 | - decompress.o \ | |
476 | - bzlib.o | |
477 | - | |
478 | -all: $(OBJS) | |
479 | - $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS) | |
480 | - $(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1 | |
481 | - rm -f libbz2.so.1.0 | |
482 | - ln -s libbz2.so.1.0.1 libbz2.so.1.0 | |
483 | - | |
484 | -clean: | |
485 | - rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared | |
486 | - | |
487 | -blocksort.o: blocksort.c | |
488 | - $(CC) $(CFLAGS) -c blocksort.c | |
489 | -huffman.o: huffman.c | |
490 | - $(CC) $(CFLAGS) -c huffman.c | |
491 | -crctable.o: crctable.c | |
492 | - $(CC) $(CFLAGS) -c crctable.c | |
493 | -randtable.o: randtable.c | |
494 | - $(CC) $(CFLAGS) -c randtable.c | |
495 | -compress.o: compress.c | |
496 | - $(CC) $(CFLAGS) -c compress.c | |
497 | -decompress.o: decompress.c | |
498 | - $(CC) $(CFLAGS) -c decompress.c | |
499 | -bzlib.o: bzlib.c | |
500 | - $(CC) $(CFLAGS) -c bzlib.c | |
501 | diff -Nru bzip2-1.0.1/Makefile.am bzip2-1.0.1.new/Makefile.am | |
502 | --- bzip2-1.0.1/Makefile.am Thu Jan 1 01:00:00 1970 | |
503 | +++ bzip2-1.0.1.new/Makefile.am Sat Jun 24 20:17:47 2000 | |
504 | @@ -0,0 +1,31 @@ | |
505 | +SUBDIRS = doc | |
506 | + | |
507 | +bin_PROGRAMS = bzip2 bzip2recover | |
508 | +bzip2_SOURCES = bzip2.c | |
509 | + | |
510 | +bzip2_LDADD = libbz2.la | |
511 | +bzip2recover_SOURCES = bzip2recover.c | |
512 | +lib_LTLIBRARIES = libbz2.la | |
513 | +libbz2_la_SOURCES = \ | |
514 | + blocksort.c \ | |
515 | + huffman.c \ | |
516 | + crctable.c \ | |
517 | + randtable.c \ | |
518 | + compress.c \ | |
519 | + decompress.c \ | |
520 | + bzlib.c \ | |
521 | + bzlib.h \ | |
522 | + bzlib_private.h | |
523 | + | |
524 | +libbz2_la_LDFLAGS = -version-info 1:0:0 | |
525 | +include_HEADERS = bzlib.h bzlib_private.h | |
526 | + | |
ff248cb7 | 527 | +bin_SCRIPTS = bzless bzgrep |
d967e3ec | 528 | + |
529 | +EXTRA_DIST = README README.COMPILATION.PROBLEMS \ | |
530 | + Y2K_INFO libbz2.def libbz2.dsp \ | |
531 | + sample1.bz2 sample1.ref sample2.bz2 sample2.ref sample3.bz2 sample3.ref | |
532 | + | |
533 | +install-exec-hook: | |
534 | + $(LN_S) -f bzip2 $(DESTDIR)$(bindir)/bunzip2 | |
535 | + $(LN_S) -f bzip2 $(DESTDIR)$(bindir)/bzcat | |
536 | diff -Nru bzip2-1.0.1/NEWS bzip2-1.0.1.new/NEWS | |
537 | --- bzip2-1.0.1/NEWS Thu Jan 1 01:00:00 1970 | |
538 | +++ bzip2-1.0.1.new/NEWS Sat Jun 24 20:13:06 2000 | |
539 | @@ -0,0 +1,12 @@ | |
540 | + | |
541 | + | |
542 | +1.0.1 | |
543 | +~~~~~ | |
544 | +* Modified dlltest.c so it uses the new BZ2_ naming scheme. | |
545 | +* Modified makefile-msc to fix minor build probs on Win2k. | |
546 | +* Updated README.COMPILATION.PROBLEMS. | |
547 | + | |
548 | +There are no functionality changes or bug fixes relative to version | |
549 | +1.0.0. This is just a documentation update + a fix for minor Win32 | |
550 | +build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is | |
551 | +utterly pointless. Don't bother. | |
552 | diff -Nru bzip2-1.0.1/acinclude.m4 bzip2-1.0.1.new/acinclude.m4 | |
553 | --- bzip2-1.0.1/acinclude.m4 Thu Jan 1 01:00:00 1970 | |
554 | +++ bzip2-1.0.1.new/acinclude.m4 Sat Jun 24 20:13:06 2000 | |
555 | @@ -0,0 +1,129 @@ | |
556 | +#serial 7 | |
557 | + | |
558 | +dnl By default, many hosts won't let programs access large files; | |
559 | +dnl one must use special compiler options to get large-file access to work. | |
560 | +dnl For more details about this brain damage please see: | |
561 | +dnl http://www.sas.com/standards/large.file/x_open.20Mar96.html | |
562 | + | |
563 | +dnl Written by Paul Eggert <eggert@twinsun.com>. | |
564 | + | |
565 | +dnl Internal subroutine of AC_SYS_LARGEFILE. | |
566 | +dnl AC_SYS_LARGEFILE_FLAGS(FLAGSNAME) | |
567 | +AC_DEFUN(AC_SYS_LARGEFILE_FLAGS, | |
568 | + [AC_CACHE_CHECK([for $1 value to request large file support], | |
569 | + ac_cv_sys_largefile_$1, | |
570 | + [if ($GETCONF LFS_$1) >conftest.1 2>conftest.2 && test ! -s conftest.2 | |
571 | + then | |
572 | + ac_cv_sys_largefile_$1=`cat conftest.1` | |
573 | + else | |
574 | + ac_cv_sys_largefile_$1=no | |
575 | + ifelse($1, CFLAGS, | |
576 | + [case "$host_os" in | |
577 | + # HP-UX 10.20 requires -D__STDC_EXT__ with gcc 2.95.1. | |
578 | +changequote(, )dnl | |
579 | + hpux10.[2-9][0-9]* | hpux1[1-9]* | hpux[2-9][0-9]*) | |
580 | +changequote([, ])dnl | |
581 | + if test "$GCC" = yes; then | |
582 | + ac_cv_sys_largefile_CFLAGS=-D__STDC_EXT__ | |
583 | + fi | |
584 | + ;; | |
585 | + # IRIX 6.2 and later require cc -n32. | |
586 | +changequote(, )dnl | |
587 | + irix6.[2-9]* | irix6.1[0-9]* | irix[7-9].* | irix[1-9][0-9]*) | |
588 | +changequote([, ])dnl | |
589 | + if test "$GCC" != yes; then | |
590 | + ac_cv_sys_largefile_CFLAGS=-n32 | |
591 | + fi | |
592 | + esac | |
593 | + if test "$ac_cv_sys_largefile_CFLAGS" != no; then | |
594 | + ac_save_CC="$CC" | |
595 | + CC="$CC $ac_cv_sys_largefile_CFLAGS" | |
596 | + AC_TRY_LINK(, , , ac_cv_sys_largefile_CFLAGS=no) | |
597 | + CC="$ac_save_CC" | |
598 | + fi]) | |
599 | + fi | |
600 | + rm -f conftest*])]) | |
601 | + | |
602 | +dnl Internal subroutine of AC_SYS_LARGEFILE. | |
603 | +dnl AC_SYS_LARGEFILE_SPACE_APPEND(VAR, VAL) | |
604 | +AC_DEFUN(AC_SYS_LARGEFILE_SPACE_APPEND, | |
605 | + [case $2 in | |
606 | + no) ;; | |
607 | + ?*) | |
608 | + case "[$]$1" in | |
609 | + '') $1=$2 ;; | |
610 | + *) $1=[$]$1' '$2 ;; | |
611 | + esac ;; | |
612 | + esac]) | |
613 | + | |
614 | +dnl Internal subroutine of AC_SYS_LARGEFILE. | |
615 | +dnl AC_SYS_LARGEFILE_MACRO_VALUE(C-MACRO, CACHE-VAR, COMMENT, CODE-TO-SET-DEFAULT) | |
616 | +AC_DEFUN(AC_SYS_LARGEFILE_MACRO_VALUE, | |
617 | + [AC_CACHE_CHECK([for $1], $2, | |
618 | + [$2=no | |
619 | +changequote(, )dnl | |
620 | + $4 | |
621 | + for ac_flag in $ac_cv_sys_largefile_CFLAGS no; do | |
622 | + case "$ac_flag" in | |
623 | + -D$1) | |
624 | + $2=1 ;; | |
625 | + -D$1=*) | |
626 | + $2=`expr " $ac_flag" : '[^=]*=\(.*\)'` ;; | |
627 | + esac | |
628 | + done | |
629 | +changequote([, ])dnl | |
630 | + ]) | |
631 | + if test "[$]$2" != no; then | |
632 | + AC_DEFINE_UNQUOTED([$1], [$]$2, [$3]) | |
633 | + fi]) | |
634 | + | |
635 | +AC_DEFUN(AC_SYS_LARGEFILE, | |
636 | + [AC_REQUIRE([AC_CANONICAL_HOST]) | |
637 | + AC_ARG_ENABLE(largefile, | |
638 | + [ --disable-largefile omit support for large files]) | |
639 | + if test "$enable_largefile" != no; then | |
640 | + AC_CHECK_TOOL(GETCONF, getconf) | |
641 | + AC_SYS_LARGEFILE_FLAGS(CFLAGS) | |
642 | + AC_SYS_LARGEFILE_FLAGS(LDFLAGS) | |
643 | + AC_SYS_LARGEFILE_FLAGS(LIBS) | |
644 | + | |
645 | + for ac_flag in $ac_cv_sys_largefile_CFLAGS no; do | |
646 | + case "$ac_flag" in | |
647 | + no) ;; | |
648 | + -D_FILE_OFFSET_BITS=*) ;; | |
649 | + -D_LARGEFILE_SOURCE | -D_LARGEFILE_SOURCE=*) ;; | |
650 | + -D_LARGE_FILES | -D_LARGE_FILES=*) ;; | |
651 | + -D?* | -I?*) | |
652 | + AC_SYS_LARGEFILE_SPACE_APPEND(CPPFLAGS, "$ac_flag") ;; | |
653 | + *) | |
654 | + AC_SYS_LARGEFILE_SPACE_APPEND(CFLAGS, "$ac_flag") ;; | |
655 | + esac | |
656 | + done | |
657 | + AC_SYS_LARGEFILE_SPACE_APPEND(LDFLAGS, "$ac_cv_sys_largefile_LDFLAGS") | |
658 | + AC_SYS_LARGEFILE_SPACE_APPEND(LIBS, "$ac_cv_sys_largefile_LIBS") | |
659 | + AC_SYS_LARGEFILE_MACRO_VALUE(_FILE_OFFSET_BITS, | |
660 | + ac_cv_sys_file_offset_bits, | |
661 | + [Number of bits in a file offset, on hosts where this is settable.], | |
662 | + [case "$host_os" in | |
663 | + # HP-UX 10.20 and later | |
664 | + hpux10.[2-9][0-9]* | hpux1[1-9]* | hpux[2-9][0-9]*) | |
665 | + ac_cv_sys_file_offset_bits=64 ;; | |
666 | + esac]) | |
667 | + AC_SYS_LARGEFILE_MACRO_VALUE(_LARGEFILE_SOURCE, | |
668 | + ac_cv_sys_largefile_source, | |
669 | + [Define to make fseeko etc. visible, on some hosts.], | |
670 | + [case "$host_os" in | |
671 | + # HP-UX 10.20 and later | |
672 | + hpux10.[2-9][0-9]* | hpux1[1-9]* | hpux[2-9][0-9]*) | |
673 | + ac_cv_sys_largefile_source=1 ;; | |
674 | + esac]) | |
675 | + AC_SYS_LARGEFILE_MACRO_VALUE(_LARGE_FILES, | |
676 | + ac_cv_sys_large_files, | |
677 | + [Define for large files, on AIX-style hosts.], | |
678 | + [case "$host_os" in | |
679 | + # AIX 4.2 and later | |
680 | + aix4.[2-9]* | aix4.1[0-9]* | aix[5-9].* | aix[1-9][0-9]*) | |
681 | + ac_cv_sys_large_files=1 ;; | |
682 | + esac]) | |
683 | + fi | |
684 | + ]) | |
685 | diff -Nru bzip2-1.0.1/bzip2.1 bzip2-1.0.1.new/bzip2.1 | |
686 | --- bzip2-1.0.1/bzip2.1 Sat Jun 24 20:13:27 2000 | |
687 | +++ bzip2-1.0.1.new/bzip2.1 Thu Jan 1 01:00:00 1970 | |
688 | @@ -1,439 +0,0 @@ | |
689 | -.PU | |
690 | -.TH bzip2 1 | |
691 | -.SH NAME | |
692 | -bzip2, bunzip2 \- a block-sorting file compressor, v1.0 | |
693 | -.br | |
694 | -bzcat \- decompresses files to stdout | |
695 | -.br | |
696 | -bzip2recover \- recovers data from damaged bzip2 files | |
697 | - | |
698 | -.SH SYNOPSIS | |
699 | -.ll +8 | |
700 | -.B bzip2 | |
701 | -.RB [ " \-cdfkqstvzVL123456789 " ] | |
702 | -[ | |
703 | -.I "filenames \&..." | |
704 | -] | |
705 | -.ll -8 | |
706 | -.br | |
707 | -.B bunzip2 | |
708 | -.RB [ " \-fkvsVL " ] | |
709 | -[ | |
710 | -.I "filenames \&..." | |
711 | -] | |
712 | -.br | |
713 | -.B bzcat | |
714 | -.RB [ " \-s " ] | |
715 | -[ | |
716 | -.I "filenames \&..." | |
717 | -] | |
718 | -.br | |
719 | -.B bzip2recover | |
720 | -.I "filename" | |
721 | - | |
722 | -.SH DESCRIPTION | |
723 | -.I bzip2 | |
724 | -compresses files using the Burrows-Wheeler block sorting | |
725 | -text compression algorithm, and Huffman coding. Compression is | |
726 | -generally considerably better than that achieved by more conventional | |
727 | -LZ77/LZ78-based compressors, and approaches the performance of the PPM | |
728 | -family of statistical compressors. | |
729 | - | |
730 | -The command-line options are deliberately very similar to | |
731 | -those of | |
732 | -.I GNU gzip, | |
733 | -but they are not identical. | |
734 | - | |
735 | -.I bzip2 | |
736 | -expects a list of file names to accompany the | |
737 | -command-line flags. Each file is replaced by a compressed version of | |
738 | -itself, with the name "original_name.bz2". | |
739 | -Each compressed file | |
740 | -has the same modification date, permissions, and, when possible, | |
741 | -ownership as the corresponding original, so that these properties can | |
742 | -be correctly restored at decompression time. File name handling is | |
743 | -naive in the sense that there is no mechanism for preserving original | |
744 | -file names, permissions, ownerships or dates in filesystems which lack | |
745 | -these concepts, or have serious file name length restrictions, such as | |
746 | -MS-DOS. | |
747 | - | |
748 | -.I bzip2 | |
749 | -and | |
750 | -.I bunzip2 | |
751 | -will by default not overwrite existing | |
752 | -files. If you want this to happen, specify the \-f flag. | |
753 | - | |
754 | -If no file names are specified, | |
755 | -.I bzip2 | |
756 | -compresses from standard | |
757 | -input to standard output. In this case, | |
758 | -.I bzip2 | |
759 | -will decline to | |
760 | -write compressed output to a terminal, as this would be entirely | |
761 | -incomprehensible and therefore pointless. | |
762 | - | |
763 | -.I bunzip2 | |
764 | -(or | |
765 | -.I bzip2 \-d) | |
766 | -decompresses all | |
767 | -specified files. Files which were not created by | |
768 | -.I bzip2 | |
769 | -will be detected and ignored, and a warning issued. | |
770 | -.I bzip2 | |
771 | -attempts to guess the filename for the decompressed file | |
772 | -from that of the compressed file as follows: | |
773 | - | |
774 | - filename.bz2 becomes filename | |
775 | - filename.bz becomes filename | |
776 | - filename.tbz2 becomes filename.tar | |
777 | - filename.tbz becomes filename.tar | |
778 | - anyothername becomes anyothername.out | |
779 | - | |
780 | -If the file does not end in one of the recognised endings, | |
781 | -.I .bz2, | |
782 | -.I .bz, | |
783 | -.I .tbz2 | |
784 | -or | |
785 | -.I .tbz, | |
786 | -.I bzip2 | |
787 | -complains that it cannot | |
788 | -guess the name of the original file, and uses the original name | |
789 | -with | |
790 | -.I .out | |
791 | -appended. | |
792 | - | |
793 | -As with compression, supplying no | |
794 | -filenames causes decompression from | |
795 | -standard input to standard output. | |
796 | - | |
797 | -.I bunzip2 | |
798 | -will correctly decompress a file which is the | |
799 | -concatenation of two or more compressed files. The result is the | |
800 | -concatenation of the corresponding uncompressed files. Integrity | |
801 | -testing (\-t) | |
802 | -of concatenated | |
803 | -compressed files is also supported. | |
804 | - | |
805 | -You can also compress or decompress files to the standard output by | |
806 | -giving the \-c flag. Multiple files may be compressed and | |
807 | -decompressed like this. The resulting outputs are fed sequentially to | |
808 | -stdout. Compression of multiple files | |
809 | -in this manner generates a stream | |
810 | -containing multiple compressed file representations. Such a stream | |
811 | -can be decompressed correctly only by | |
812 | -.I bzip2 | |
813 | -version 0.9.0 or | |
814 | -later. Earlier versions of | |
815 | -.I bzip2 | |
816 | -will stop after decompressing | |
817 | -the first file in the stream. | |
818 | - | |
819 | -.I bzcat | |
820 | -(or | |
821 | -.I bzip2 -dc) | |
822 | -decompresses all specified files to | |
823 | -the standard output. | |
824 | - | |
825 | -.I bzip2 | |
826 | -will read arguments from the environment variables | |
827 | -.I BZIP2 | |
828 | -and | |
829 | -.I BZIP, | |
830 | -in that order, and will process them | |
831 | -before any arguments read from the command line. This gives a | |
832 | -convenient way to supply default arguments. | |
833 | - | |
834 | -Compression is always performed, even if the compressed | |
835 | -file is slightly | |
836 | -larger than the original. Files of less than about one hundred bytes | |
837 | -tend to get larger, since the compression mechanism has a constant | |
838 | -overhead in the region of 50 bytes. Random data (including the output | |
839 | -of most file compressors) is coded at about 8.05 bits per byte, giving | |
840 | -an expansion of around 0.5%. | |
841 | - | |
842 | -As a self-check for your protection, | |
843 | -.I | |
844 | -bzip2 | |
845 | -uses 32-bit CRCs to | |
846 | -make sure that the decompressed version of a file is identical to the | |
847 | -original. This guards against corruption of the compressed data, and | |
848 | -against undetected bugs in | |
849 | -.I bzip2 | |
850 | -(hopefully very unlikely). The | |
851 | -chances of data corruption going undetected is microscopic, about one | |
852 | -chance in four billion for each file processed. Be aware, though, that | |
853 | -the check occurs upon decompression, so it can only tell you that | |
854 | -something is wrong. It can't help you | |
855 | -recover the original uncompressed | |
856 | -data. You can use | |
857 | -.I bzip2recover | |
858 | -to try to recover data from | |
859 | -damaged files. | |
860 | - | |
861 | -Return values: 0 for a normal exit, 1 for environmental problems (file | |
862 | -not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt | |
863 | -compressed file, 3 for an internal consistency error (eg, bug) which | |
864 | -caused | |
865 | -.I bzip2 | |
866 | -to panic. | |
867 | - | |
868 | -.SH OPTIONS | |
869 | -.TP | |
870 | -.B \-c --stdout | |
871 | -Compress or decompress to standard output. | |
872 | -.TP | |
873 | -.B \-d --decompress | |
874 | -Force decompression. | |
875 | -.I bzip2, | |
876 | -.I bunzip2 | |
877 | -and | |
878 | -.I bzcat | |
879 | -are | |
880 | -really the same program, and the decision about what actions to take is | |
881 | -done on the basis of which name is used. This flag overrides that | |
882 | -mechanism, and forces | |
883 | -.I bzip2 | |
884 | -to decompress. | |
885 | -.TP | |
886 | -.B \-z --compress | |
887 | -The complement to \-d: forces compression, regardless of the | |
888 | -invokation name. | |
889 | -.TP | |
890 | -.B \-t --test | |
891 | -Check integrity of the specified file(s), but don't decompress them. | |
892 | -This really performs a trial decompression and throws away the result. | |
893 | -.TP | |
894 | -.B \-f --force | |
895 | -Force overwrite of output files. Normally, | |
896 | -.I bzip2 | |
897 | -will not overwrite | |
898 | -existing output files. Also forces | |
899 | -.I bzip2 | |
900 | -to break hard links | |
901 | -to files, which it otherwise wouldn't do. | |
902 | -.TP | |
903 | -.B \-k --keep | |
904 | -Keep (don't delete) input files during compression | |
905 | -or decompression. | |
906 | -.TP | |
907 | -.B \-s --small | |
908 | -Reduce memory usage, for compression, decompression and testing. Files | |
909 | -are decompressed and tested using a modified algorithm which only | |
910 | -requires 2.5 bytes per block byte. This means any file can be | |
911 | -decompressed in 2300k of memory, albeit at about half the normal speed. | |
912 | - | |
913 | -During compression, \-s selects a block size of 200k, which limits | |
914 | -memory use to around the same figure, at the expense of your compression | |
915 | -ratio. In short, if your machine is low on memory (8 megabytes or | |
916 | -less), use \-s for everything. See MEMORY MANAGEMENT below. | |
917 | -.TP | |
918 | -.B \-q --quiet | |
919 | -Suppress non-essential warning messages. Messages pertaining to | |
920 | -I/O errors and other critical events will not be suppressed. | |
921 | -.TP | |
922 | -.B \-v --verbose | |
923 | -Verbose mode -- show the compression ratio for each file processed. | |
924 | -Further \-v's increase the verbosity level, spewing out lots of | |
925 | -information which is primarily of interest for diagnostic purposes. | |
926 | -.TP | |
927 | -.B \-L --license -V --version | |
928 | -Display the software version, license terms and conditions. | |
929 | -.TP | |
930 | -.B \-1 to \-9 | |
931 | -Set the block size to 100 k, 200 k .. 900 k when compressing. Has no | |
932 | -effect when decompressing. See MEMORY MANAGEMENT below. | |
933 | -.TP | |
934 | -.B \-- | |
935 | -Treats all subsequent arguments as file names, even if they start | |
936 | -with a dash. This is so you can handle files with names beginning | |
937 | -with a dash, for example: bzip2 \-- \-myfilename. | |
938 | -.TP | |
939 | -.B \--repetitive-fast --repetitive-best | |
940 | -These flags are redundant in versions 0.9.5 and above. They provided | |
941 | -some coarse control over the behaviour of the sorting algorithm in | |
942 | -earlier versions, which was sometimes useful. 0.9.5 and above have an | |
943 | -improved algorithm which renders these flags irrelevant. | |
944 | - | |
945 | -.SH MEMORY MANAGEMENT | |
946 | -.I bzip2 | |
947 | -compresses large files in blocks. The block size affects | |
948 | -both the compression ratio achieved, and the amount of memory needed for | |
949 | -compression and decompression. The flags \-1 through \-9 | |
950 | -specify the block size to be 100,000 bytes through 900,000 bytes (the | |
951 | -default) respectively. At decompression time, the block size used for | |
952 | -compression is read from the header of the compressed file, and | |
953 | -.I bunzip2 | |
954 | -then allocates itself just enough memory to decompress | |
955 | -the file. Since block sizes are stored in compressed files, it follows | |
956 | -that the flags \-1 to \-9 are irrelevant to and so ignored | |
957 | -during decompression. | |
958 | - | |
959 | -Compression and decompression requirements, | |
960 | -in bytes, can be estimated as: | |
961 | - | |
962 | - Compression: 400k + ( 8 x block size ) | |
963 | - | |
964 | - Decompression: 100k + ( 4 x block size ), or | |
965 | - 100k + ( 2.5 x block size ) | |
966 | - | |
967 | -Larger block sizes give rapidly diminishing marginal returns. Most of | |
968 | -the compression comes from the first two or three hundred k of block | |
969 | -size, a fact worth bearing in mind when using | |
970 | -.I bzip2 | |
971 | -on small machines. | |
972 | -It is also important to appreciate that the decompression memory | |
973 | -requirement is set at compression time by the choice of block size. | |
974 | - | |
975 | -For files compressed with the default 900k block size, | |
976 | -.I bunzip2 | |
977 | -will require about 3700 kbytes to decompress. To support decompression | |
978 | -of any file on a 4 megabyte machine, | |
979 | -.I bunzip2 | |
980 | -has an option to | |
981 | -decompress using approximately half this amount of memory, about 2300 | |
982 | -kbytes. Decompression speed is also halved, so you should use this | |
983 | -option only where necessary. The relevant flag is -s. | |
984 | - | |
985 | -In general, try and use the largest block size memory constraints allow, | |
986 | -since that maximises the compression achieved. Compression and | |
987 | -decompression speed are virtually unaffected by block size. | |
988 | - | |
989 | -Another significant point applies to files which fit in a single block | |
990 | --- that means most files you'd encounter using a large block size. The | |
991 | -amount of real memory touched is proportional to the size of the file, | |
992 | -since the file is smaller than a block. For example, compressing a file | |
993 | -20,000 bytes long with the flag -9 will cause the compressor to | |
994 | -allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560 | |
995 | -kbytes of it. Similarly, the decompressor will allocate 3700k but only | |
996 | -touch 100k + 20000 * 4 = 180 kbytes. | |
997 | - | |
998 | -Here is a table which summarises the maximum memory usage for different | |
999 | -block sizes. Also recorded is the total compressed size for 14 files of | |
1000 | -the Calgary Text Compression Corpus totalling 3,141,622 bytes. This | |
1001 | -column gives some feel for how compression varies with block size. | |
1002 | -These figures tend to understate the advantage of larger block sizes for | |
1003 | -larger files, since the Corpus is dominated by smaller files. | |
1004 | - | |
1005 | - Compress Decompress Decompress Corpus | |
1006 | - Flag usage usage -s usage Size | |
1007 | - | |
1008 | - -1 1200k 500k 350k 914704 | |
1009 | - -2 2000k 900k 600k 877703 | |
1010 | - -3 2800k 1300k 850k 860338 | |
1011 | - -4 3600k 1700k 1100k 846899 | |
1012 | - -5 4400k 2100k 1350k 845160 | |
1013 | - -6 5200k 2500k 1600k 838626 | |
1014 | - -7 6100k 2900k 1850k 834096 | |
1015 | - -8 6800k 3300k 2100k 828642 | |
1016 | - -9 7600k 3700k 2350k 828642 | |
1017 | - | |
1018 | -.SH RECOVERING DATA FROM DAMAGED FILES | |
1019 | -.I bzip2 | |
1020 | -compresses files in blocks, usually 900kbytes long. Each | |
1021 | -block is handled independently. If a media or transmission error causes | |
1022 | -a multi-block .bz2 | |
1023 | -file to become damaged, it may be possible to | |
1024 | -recover data from the undamaged blocks in the file. | |
1025 | - | |
1026 | -The compressed representation of each block is delimited by a 48-bit | |
1027 | -pattern, which makes it possible to find the block boundaries with | |
1028 | -reasonable certainty. Each block also carries its own 32-bit CRC, so | |
1029 | -damaged blocks can be distinguished from undamaged ones. | |
1030 | - | |
1031 | -.I bzip2recover | |
1032 | -is a simple program whose purpose is to search for | |
1033 | -blocks in .bz2 files, and write each block out into its own .bz2 | |
1034 | -file. You can then use | |
1035 | -.I bzip2 | |
1036 | -\-t | |
1037 | -to test the | |
1038 | -integrity of the resulting files, and decompress those which are | |
1039 | -undamaged. | |
1040 | - | |
1041 | -.I bzip2recover | |
1042 | -takes a single argument, the name of the damaged file, | |
1043 | -and writes a number of files "rec0001file.bz2", | |
1044 | -"rec0002file.bz2", etc, containing the extracted blocks. | |
1045 | -The output filenames are designed so that the use of | |
1046 | -wildcards in subsequent processing -- for example, | |
1047 | -"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in | |
1048 | -the correct order. | |
1049 | - | |
1050 | -.I bzip2recover | |
1051 | -should be of most use dealing with large .bz2 | |
1052 | -files, as these will contain many blocks. It is clearly | |
1053 | -futile to use it on damaged single-block files, since a | |
1054 | -damaged block cannot be recovered. If you wish to minimise | |
1055 | -any potential data loss through media or transmission errors, | |
1056 | -you might consider compressing with a smaller | |
1057 | -block size. | |
1058 | - | |
1059 | -.SH PERFORMANCE NOTES | |
1060 | -The sorting phase of compression gathers together similar strings in the | |
1061 | -file. Because of this, files containing very long runs of repeated | |
1062 | -symbols, like "aabaabaabaab ..." (repeated several hundred times) may | |
1063 | -compress more slowly than normal. Versions 0.9.5 and above fare much | |
1064 | -better than previous versions in this respect. The ratio between | |
1065 | -worst-case and average-case compression time is in the region of 10:1. | |
1066 | -For previous versions, this figure was more like 100:1. You can use the | |
1067 | -\-vvvv option to monitor progress in great detail, if you want. | |
1068 | - | |
1069 | -Decompression speed is unaffected by these phenomena. | |
1070 | - | |
1071 | -.I bzip2 | |
1072 | -usually allocates several megabytes of memory to operate | |
1073 | -in, and then charges all over it in a fairly random fashion. This means | |
1074 | -that performance, both for compressing and decompressing, is largely | |
1075 | -determined by the speed at which your machine can service cache misses. | |
1076 | -Because of this, small changes to the code to reduce the miss rate have | |
1077 | -been observed to give disproportionately large performance improvements. | |
1078 | -I imagine | |
1079 | -.I bzip2 | |
1080 | -will perform best on machines with very large caches. | |
1081 | - | |
1082 | -.SH CAVEATS | |
1083 | -I/O error messages are not as helpful as they could be. | |
1084 | -.I bzip2 | |
1085 | -tries hard to detect I/O errors and exit cleanly, but the details of | |
1086 | -what the problem is sometimes seem rather misleading. | |
1087 | - | |
1088 | -This manual page pertains to version 1.0 of | |
1089 | -.I bzip2. | |
1090 | -Compressed | |
1091 | -data created by this version is entirely forwards and backwards | |
1092 | -compatible with the previous public releases, versions 0.1pl2, 0.9.0 | |
1093 | -and 0.9.5, | |
1094 | -but with the following exception: 0.9.0 and above can correctly | |
1095 | -decompress multiple concatenated compressed files. 0.1pl2 cannot do | |
1096 | -this; it will stop after decompressing just the first file in the | |
1097 | -stream. | |
1098 | - | |
1099 | -.I bzip2recover | |
1100 | -uses 32-bit integers to represent bit positions in | |
1101 | -compressed files, so it cannot handle compressed files more than 512 | |
1102 | -megabytes long. This could easily be fixed. | |
1103 | - | |
1104 | -.SH AUTHOR | |
1105 | -Julian Seward, jseward@acm.org. | |
1106 | - | |
1107 | -http://sourceware.cygnus.com/bzip2 | |
1108 | -http://www.muraroa.demon.co.uk | |
1109 | - | |
1110 | -The ideas embodied in | |
1111 | -.I bzip2 | |
1112 | -are due to (at least) the following | |
1113 | -people: Michael Burrows and David Wheeler (for the block sorting | |
1114 | -transformation), David Wheeler (again, for the Huffman coder), Peter | |
1115 | -Fenwick (for the structured coding model in the original | |
1116 | -.I bzip, | |
1117 | -and many refinements), and Alistair Moffat, Radford Neal and Ian Witten | |
1118 | -(for the arithmetic coder in the original | |
1119 | -.I bzip). | |
1120 | -I am much | |
1121 | -indebted for their help, support and advice. See the manual in the | |
1122 | -source distribution for pointers to sources of documentation. Christian | |
1123 | -von Roques encouraged me to look for faster sorting algorithms, so as to | |
1124 | -speed up compression. Bela Lubkin encouraged me to improve the | |
1125 | -worst-case compression performance. Many people sent patches, helped | |
1126 | -with portability problems, lent machines, gave advice and were generally | |
1127 | -helpful. | |
1128 | diff -Nru bzip2-1.0.1/bzip2.1.preformatted bzip2-1.0.1.new/bzip2.1.preformatted | |
1129 | --- bzip2-1.0.1/bzip2.1.preformatted Sat Jun 24 20:13:27 2000 | |
1130 | +++ bzip2-1.0.1.new/bzip2.1.preformatted Thu Jan 1 01:00:00 1970 | |
1131 | @@ -1,462 +0,0 @@ | |
1132 | - | |
1133 | - | |
1134 | - | |
1135 | -bzip2(1) bzip2(1) | |
1136 | - | |
1137 | - | |
1138 | -N\bNA\bAM\bME\bE | |
1139 | - bzip2, bunzip2 - a block-sorting file compressor, v1.0 | |
1140 | - bzcat - decompresses files to stdout | |
1141 | - bzip2recover - recovers data from damaged bzip2 files | |
1142 | - | |
1143 | - | |
1144 | -S\bSY\bYN\bNO\bOP\bPS\bSI\bIS\bS | |
1145 | - b\bbz\bzi\bip\bp2\b2 [ -\b-c\bcd\bdf\bfk\bkq\bqs\bst\btv\bvz\bzV\bVL\bL1\b12\b23\b34\b45\b56\b67\b78\b89\b9 ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ] | |
1146 | - b\bbu\bun\bnz\bzi\bip\bp2\b2 [ -\b-f\bfk\bkv\bvs\bsV\bVL\bL ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ] | |
1147 | - b\bbz\bzc\bca\bat\bt [ -\b-s\bs ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ] | |
1148 | - b\bbz\bzi\bip\bp2\b2r\bre\bec\bco\bov\bve\ber\br _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be | |
1149 | - | |
1150 | - | |
1151 | -D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN | |
1152 | - _\bb_\bz_\bi_\bp_\b2 compresses files using the Burrows-Wheeler block | |
1153 | - sorting text compression algorithm, and Huffman coding. | |
1154 | - Compression is generally considerably better than that | |
1155 | - achieved by more conventional LZ77/LZ78-based compressors, | |
1156 | - and approaches the performance of the PPM family of sta- | |
1157 | - tistical compressors. | |
1158 | - | |
1159 | - The command-line options are deliberately very similar to | |
1160 | - those of _\bG_\bN_\bU _\bg_\bz_\bi_\bp_\b, but they are not identical. | |
1161 | - | |
1162 | - _\bb_\bz_\bi_\bp_\b2 expects a list of file names to accompany the com- | |
1163 | - mand-line flags. Each file is replaced by a compressed | |
1164 | - version of itself, with the name "original_name.bz2". | |
1165 | - Each compressed file has the same modification date, per- | |
1166 | - missions, and, when possible, ownership as the correspond- | |
1167 | - ing original, so that these properties can be correctly | |
1168 | - restored at decompression time. File name handling is | |
1169 | - naive in the sense that there is no mechanism for preserv- | |
1170 | - ing original file names, permissions, ownerships or dates | |
1171 | - in filesystems which lack these concepts, or have serious | |
1172 | - file name length restrictions, such as MS-DOS. | |
1173 | - | |
1174 | - _\bb_\bz_\bi_\bp_\b2 and _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will by default not overwrite existing | |
1175 | - files. If you want this to happen, specify the -f flag. | |
1176 | - | |
1177 | - If no file names are specified, _\bb_\bz_\bi_\bp_\b2 compresses from | |
1178 | - standard input to standard output. In this case, _\bb_\bz_\bi_\bp_\b2 | |
1179 | - will decline to write compressed output to a terminal, as | |
1180 | - this would be entirely incomprehensible and therefore | |
1181 | - pointless. | |
1182 | - | |
1183 | - _\bb_\bu_\bn_\bz_\bi_\bp_\b2 (or _\bb_\bz_\bi_\bp_\b2 _\b-_\bd_\b) decompresses all specified files. | |
1184 | - Files which were not created by _\bb_\bz_\bi_\bp_\b2 will be detected and | |
1185 | - ignored, and a warning issued. _\bb_\bz_\bi_\bp_\b2 attempts to guess | |
1186 | - the filename for the decompressed file from that of the | |
1187 | - compressed file as follows: | |
1188 | - | |
1189 | - filename.bz2 becomes filename | |
1190 | - filename.bz becomes filename | |
1191 | - filename.tbz2 becomes filename.tar | |
1192 | - | |
1193 | - | |
1194 | - | |
1195 | - 1 | |
1196 | - | |
1197 | - | |
1198 | - | |
1199 | - | |
1200 | - | |
1201 | -bzip2(1) bzip2(1) | |
1202 | - | |
1203 | - | |
1204 | - filename.tbz becomes filename.tar | |
1205 | - anyothername becomes anyothername.out | |
1206 | - | |
1207 | - If the file does not end in one of the recognised endings, | |
1208 | - _\b._\bb_\bz_\b2_\b, _\b._\bb_\bz_\b, _\b._\bt_\bb_\bz_\b2 or _\b._\bt_\bb_\bz_\b, _\bb_\bz_\bi_\bp_\b2 complains that it cannot | |
1209 | - guess the name of the original file, and uses the original | |
1210 | - name with _\b._\bo_\bu_\bt appended. | |
1211 | - | |
1212 | - As with compression, supplying no filenames causes decom- | |
1213 | - pression from standard input to standard output. | |
1214 | - | |
1215 | - _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will correctly decompress a file which is the con- | |
1216 | - catenation of two or more compressed files. The result is | |
1217 | - the concatenation of the corresponding uncompressed files. | |
1218 | - Integrity testing (-t) of concatenated compressed files is | |
1219 | - also supported. | |
1220 | - | |
1221 | - You can also compress or decompress files to the standard | |
1222 | - output by giving the -c flag. Multiple files may be com- | |
1223 | - pressed and decompressed like this. The resulting outputs | |
1224 | - are fed sequentially to stdout. Compression of multiple | |
1225 | - files in this manner generates a stream containing multi- | |
1226 | - ple compressed file representations. Such a stream can be | |
1227 | - decompressed correctly only by _\bb_\bz_\bi_\bp_\b2 version 0.9.0 or | |
1228 | - later. Earlier versions of _\bb_\bz_\bi_\bp_\b2 will stop after decom- | |
1229 | - pressing the first file in the stream. | |
1230 | - | |
1231 | - _\bb_\bz_\bc_\ba_\bt (or _\bb_\bz_\bi_\bp_\b2 _\b-_\bd_\bc_\b) decompresses all specified files to | |
1232 | - the standard output. | |
1233 | - | |
1234 | - _\bb_\bz_\bi_\bp_\b2 will read arguments from the environment variables | |
1235 | - _\bB_\bZ_\bI_\bP_\b2 and _\bB_\bZ_\bI_\bP_\b, in that order, and will process them | |
1236 | - before any arguments read from the command line. This | |
1237 | - gives a convenient way to supply default arguments. | |
1238 | - | |
1239 | - Compression is always performed, even if the compressed | |
1240 | - file is slightly larger than the original. Files of less | |
1241 | - than about one hundred bytes tend to get larger, since the | |
1242 | - compression mechanism has a constant overhead in the | |
1243 | - region of 50 bytes. Random data (including the output of | |
1244 | - most file compressors) is coded at about 8.05 bits per | |
1245 | - byte, giving an expansion of around 0.5%. | |
1246 | - | |
1247 | - As a self-check for your protection, _\bb_\bz_\bi_\bp_\b2 uses 32-bit | |
1248 | - CRCs to make sure that the decompressed version of a file | |
1249 | - is identical to the original. This guards against corrup- | |
1250 | - tion of the compressed data, and against undetected bugs | |
1251 | - in _\bb_\bz_\bi_\bp_\b2 (hopefully very unlikely). The chances of data | |
1252 | - corruption going undetected is microscopic, about one | |
1253 | - chance in four billion for each file processed. Be aware, | |
1254 | - though, that the check occurs upon decompression, so it | |
1255 | - can only tell you that something is wrong. It can't help | |
1256 | - you recover the original uncompressed data. You can use | |
1257 | - _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br to try to recover data from damaged files. | |
1258 | - | |
1259 | - | |
1260 | - | |
1261 | - 2 | |
1262 | - | |
1263 | - | |
1264 | - | |
1265 | - | |
1266 | - | |
1267 | -bzip2(1) bzip2(1) | |
1268 | - | |
1269 | - | |
1270 | - Return values: 0 for a normal exit, 1 for environmental | |
1271 | - problems (file not found, invalid flags, I/O errors, &c), | |
1272 | - 2 to indicate a corrupt compressed file, 3 for an internal | |
1273 | - consistency error (eg, bug) which caused _\bb_\bz_\bi_\bp_\b2 to panic. | |
1274 | - | |
1275 | - | |
1276 | -O\bOP\bPT\bTI\bIO\bON\bNS\bS | |
1277 | - -\b-c\bc -\b--\b-s\bst\btd\bdo\bou\but\bt | |
1278 | - Compress or decompress to standard output. | |
1279 | - | |
1280 | - -\b-d\bd -\b--\b-d\bde\bec\bco\bom\bmp\bpr\bre\bes\bss\bs | |
1281 | - Force decompression. _\bb_\bz_\bi_\bp_\b2_\b, _\bb_\bu_\bn_\bz_\bi_\bp_\b2 and _\bb_\bz_\bc_\ba_\bt are | |
1282 | - really the same program, and the decision about | |
1283 | - what actions to take is done on the basis of which | |
1284 | - name is used. This flag overrides that mechanism, | |
1285 | - and forces _\bb_\bz_\bi_\bp_\b2 to decompress. | |
1286 | - | |
1287 | - -\b-z\bz -\b--\b-c\bco\bom\bmp\bpr\bre\bes\bss\bs | |
1288 | - The complement to -d: forces compression, regard- | |
1289 | - less of the invokation name. | |
1290 | - | |
1291 | - -\b-t\bt -\b--\b-t\bte\bes\bst\bt | |
1292 | - Check integrity of the specified file(s), but don't | |
1293 | - decompress them. This really performs a trial | |
1294 | - decompression and throws away the result. | |
1295 | - | |
1296 | - -\b-f\bf -\b--\b-f\bfo\bor\brc\bce\be | |
1297 | - Force overwrite of output files. Normally, _\bb_\bz_\bi_\bp_\b2 | |
1298 | - will not overwrite existing output files. Also | |
1299 | - forces _\bb_\bz_\bi_\bp_\b2 to break hard links to files, which it | |
1300 | - otherwise wouldn't do. | |
1301 | - | |
1302 | - -\b-k\bk -\b--\b-k\bke\bee\bep\bp | |
1303 | - Keep (don't delete) input files during compression | |
1304 | - or decompression. | |
1305 | - | |
1306 | - -\b-s\bs -\b--\b-s\bsm\bma\bal\bll\bl | |
1307 | - Reduce memory usage, for compression, decompression | |
1308 | - and testing. Files are decompressed and tested | |
1309 | - using a modified algorithm which only requires 2.5 | |
1310 | - bytes per block byte. This means any file can be | |
1311 | - decompressed in 2300k of memory, albeit at about | |
1312 | - half the normal speed. | |
1313 | - | |
1314 | - During compression, -s selects a block size of | |
1315 | - 200k, which limits memory use to around the same | |
1316 | - figure, at the expense of your compression ratio. | |
1317 | - In short, if your machine is low on memory (8 | |
1318 | - megabytes or less), use -s for everything. See | |
1319 | - MEMORY MANAGEMENT below. | |
1320 | - | |
1321 | - -\b-q\bq -\b--\b-q\bqu\bui\bie\bet\bt | |
1322 | - Suppress non-essential warning messages. Messages | |
1323 | - pertaining to I/O errors and other critical events | |
1324 | - | |
1325 | - | |
1326 | - | |
1327 | - 3 | |
1328 | - | |
1329 | - | |
1330 | - | |
1331 | - | |
1332 | - | |
1333 | -bzip2(1) bzip2(1) | |
1334 | - | |
1335 | - | |
1336 | - will not be suppressed. | |
1337 | - | |
1338 | - -\b-v\bv -\b--\b-v\bve\ber\brb\bbo\bos\bse\be | |
1339 | - Verbose mode -- show the compression ratio for each | |
1340 | - file processed. Further -v's increase the ver- | |
1341 | - bosity level, spewing out lots of information which | |
1342 | - is primarily of interest for diagnostic purposes. | |
1343 | - | |
1344 | - -\b-L\bL -\b--\b-l\bli\bic\bce\ben\bns\bse\be -\b-V\bV -\b--\b-v\bve\ber\brs\bsi\bio\bon\bn | |
1345 | - Display the software version, license terms and | |
1346 | - conditions. | |
1347 | - | |
1348 | - -\b-1\b1 t\bto\bo -\b-9\b9 | |
1349 | - Set the block size to 100 k, 200 k .. 900 k when | |
1350 | - compressing. Has no effect when decompressing. | |
1351 | - See MEMORY MANAGEMENT below. | |
1352 | - | |
1353 | - -\b--\b- Treats all subsequent arguments as file names, even | |
1354 | - if they start with a dash. This is so you can han- | |
1355 | - dle files with names beginning with a dash, for | |
1356 | - example: bzip2 -- -myfilename. | |
1357 | - | |
1358 | - -\b--\b-r\bre\bep\bpe\bet\bti\bit\bti\biv\bve\be-\b-f\bfa\bas\bst\bt -\b--\b-r\bre\bep\bpe\bet\bti\bit\bti\biv\bve\be-\b-b\bbe\bes\bst\bt | |
1359 | - These flags are redundant in versions 0.9.5 and | |
1360 | - above. They provided some coarse control over the | |
1361 | - behaviour of the sorting algorithm in earlier ver- | |
1362 | - sions, which was sometimes useful. 0.9.5 and above | |
1363 | - have an improved algorithm which renders these | |
1364 | - flags irrelevant. | |
1365 | - | |
1366 | - | |
1367 | -M\bME\bEM\bMO\bOR\bRY\bY M\bMA\bAN\bNA\bAG\bGE\bEM\bME\bEN\bNT\bT | |
1368 | - _\bb_\bz_\bi_\bp_\b2 compresses large files in blocks. The block size | |
1369 | - affects both the compression ratio achieved, and the | |
1370 | - amount of memory needed for compression and decompression. | |
1371 | - The flags -1 through -9 specify the block size to be | |
1372 | - 100,000 bytes through 900,000 bytes (the default) respec- | |
1373 | - tively. At decompression time, the block size used for | |
1374 | - compression is read from the header of the compressed | |
1375 | - file, and _\bb_\bu_\bn_\bz_\bi_\bp_\b2 then allocates itself just enough memory | |
1376 | - to decompress the file. Since block sizes are stored in | |
1377 | - compressed files, it follows that the flags -1 to -9 are | |
1378 | - irrelevant to and so ignored during decompression. | |
1379 | - | |
1380 | - Compression and decompression requirements, in bytes, can | |
1381 | - be estimated as: | |
1382 | - | |
1383 | - Compression: 400k + ( 8 x block size ) | |
1384 | - | |
1385 | - Decompression: 100k + ( 4 x block size ), or | |
1386 | - 100k + ( 2.5 x block size ) | |
1387 | - | |
1388 | - Larger block sizes give rapidly diminishing marginal | |
1389 | - returns. Most of the compression comes from the first two | |
1390 | - | |
1391 | - | |
1392 | - | |
1393 | - 4 | |
1394 | - | |
1395 | - | |
1396 | - | |
1397 | - | |
1398 | - | |
1399 | -bzip2(1) bzip2(1) | |
1400 | - | |
1401 | - | |
1402 | - or three hundred k of block size, a fact worth bearing in | |
1403 | - mind when using _\bb_\bz_\bi_\bp_\b2 on small machines. It is also | |
1404 | - important to appreciate that the decompression memory | |
1405 | - requirement is set at compression time by the choice of | |
1406 | - block size. | |
1407 | - | |
1408 | - For files compressed with the default 900k block size, | |
1409 | - _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will require about 3700 kbytes to decompress. To | |
1410 | - support decompression of any file on a 4 megabyte machine, | |
1411 | - _\bb_\bu_\bn_\bz_\bi_\bp_\b2 has an option to decompress using approximately | |
1412 | - half this amount of memory, about 2300 kbytes. Decompres- | |
1413 | - sion speed is also halved, so you should use this option | |
1414 | - only where necessary. The relevant flag is -s. | |
1415 | - | |
1416 | - In general, try and use the largest block size memory con- | |
1417 | - straints allow, since that maximises the compression | |
1418 | - achieved. Compression and decompression speed are virtu- | |
1419 | - ally unaffected by block size. | |
1420 | - | |
1421 | - Another significant point applies to files which fit in a | |
1422 | - single block -- that means most files you'd encounter | |
1423 | - using a large block size. The amount of real memory | |
1424 | - touched is proportional to the size of the file, since the | |
1425 | - file is smaller than a block. For example, compressing a | |
1426 | - file 20,000 bytes long with the flag -9 will cause the | |
1427 | - compressor to allocate around 7600k of memory, but only | |
1428 | - touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the | |
1429 | - decompressor will allocate 3700k but only touch 100k + | |
1430 | - 20000 * 4 = 180 kbytes. | |
1431 | - | |
1432 | - Here is a table which summarises the maximum memory usage | |
1433 | - for different block sizes. Also recorded is the total | |
1434 | - compressed size for 14 files of the Calgary Text Compres- | |
1435 | - sion Corpus totalling 3,141,622 bytes. This column gives | |
1436 | - some feel for how compression varies with block size. | |
1437 | - These figures tend to understate the advantage of larger | |
1438 | - block sizes for larger files, since the Corpus is domi- | |
1439 | - nated by smaller files. | |
1440 | - | |
1441 | - Compress Decompress Decompress Corpus | |
1442 | - Flag usage usage -s usage Size | |
1443 | - | |
1444 | - -1 1200k 500k 350k 914704 | |
1445 | - -2 2000k 900k 600k 877703 | |
1446 | - -3 2800k 1300k 850k 860338 | |
1447 | - -4 3600k 1700k 1100k 846899 | |
1448 | - -5 4400k 2100k 1350k 845160 | |
1449 | - -6 5200k 2500k 1600k 838626 | |
1450 | - -7 6100k 2900k 1850k 834096 | |
1451 | - -8 6800k 3300k 2100k 828642 | |
1452 | - -9 7600k 3700k 2350k 828642 | |
1453 | - | |
1454 | - | |
1455 | - | |
1456 | - | |
1457 | - | |
1458 | - | |
1459 | - 5 | |
1460 | - | |
1461 | - | |
1462 | - | |
1463 | - | |
1464 | - | |
1465 | -bzip2(1) bzip2(1) | |
1466 | - | |
1467 | - | |
1468 | -R\bRE\bEC\bCO\bOV\bVE\bER\bRI\bIN\bNG\bG D\bDA\bAT\bTA\bA F\bFR\bRO\bOM\bM D\bDA\bAM\bMA\bAG\bGE\bED\bD F\bFI\bIL\bLE\bES\bS | |
1469 | - _\bb_\bz_\bi_\bp_\b2 compresses files in blocks, usually 900kbytes long. | |
1470 | - Each block is handled independently. If a media or trans- | |
1471 | - mission error causes a multi-block .bz2 file to become | |
1472 | - damaged, it may be possible to recover data from the | |
1473 | - undamaged blocks in the file. | |
1474 | - | |
1475 | - The compressed representation of each block is delimited | |
1476 | - by a 48-bit pattern, which makes it possible to find the | |
1477 | - block boundaries with reasonable certainty. Each block | |
1478 | - also carries its own 32-bit CRC, so damaged blocks can be | |
1479 | - distinguished from undamaged ones. | |
1480 | - | |
1481 | - _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br is a simple program whose purpose is to | |
1482 | - search for blocks in .bz2 files, and write each block out | |
1483 | - into its own .bz2 file. You can then use _\bb_\bz_\bi_\bp_\b2 -t to test | |
1484 | - the integrity of the resulting files, and decompress those | |
1485 | - which are undamaged. | |
1486 | - | |
1487 | - _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br takes a single argument, the name of the dam- | |
1488 | - aged file, and writes a number of files "rec0001file.bz2", | |
1489 | - "rec0002file.bz2", etc, containing the extracted blocks. | |
1490 | - The output filenames are designed so that the use of | |
1491 | - wildcards in subsequent processing -- for example, "bzip2 | |
1492 | - -dc rec*file.bz2 > recovered_data" -- lists the files in | |
1493 | - the correct order. | |
1494 | - | |
1495 | - _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br should be of most use dealing with large .bz2 | |
1496 | - files, as these will contain many blocks. It is clearly | |
1497 | - futile to use it on damaged single-block files, since a | |
1498 | - damaged block cannot be recovered. If you wish to min- | |
1499 | - imise any potential data loss through media or transmis- | |
1500 | - sion errors, you might consider compressing with a smaller | |
1501 | - block size. | |
1502 | - | |
1503 | - | |
1504 | -P\bPE\bER\bRF\bFO\bOR\bRM\bMA\bAN\bNC\bCE\bE N\bNO\bOT\bTE\bES\bS | |
1505 | - The sorting phase of compression gathers together similar | |
1506 | - strings in the file. Because of this, files containing | |
1507 | - very long runs of repeated symbols, like "aabaabaabaab | |
1508 | - ..." (repeated several hundred times) may compress more | |
1509 | - slowly than normal. Versions 0.9.5 and above fare much | |
1510 | - better than previous versions in this respect. The ratio | |
1511 | - between worst-case and average-case compression time is in | |
1512 | - the region of 10:1. For previous versions, this figure | |
1513 | - was more like 100:1. You can use the -vvvv option to mon- | |
1514 | - itor progress in great detail, if you want. | |
1515 | - | |
1516 | - Decompression speed is unaffected by these phenomena. | |
1517 | - | |
1518 | - _\bb_\bz_\bi_\bp_\b2 usually allocates several megabytes of memory to | |
1519 | - operate in, and then charges all over it in a fairly ran- | |
1520 | - dom fashion. This means that performance, both for com- | |
1521 | - pressing and decompressing, is largely determined by the | |
1522 | - | |
1523 | - | |
1524 | - | |
1525 | - 6 | |
1526 | - | |
1527 | - | |
1528 | - | |
1529 | - | |
1530 | - | |
1531 | -bzip2(1) bzip2(1) | |
1532 | - | |
1533 | - | |
1534 | - speed at which your machine can service cache misses. | |
1535 | - Because of this, small changes to the code to reduce the | |
1536 | - miss rate have been observed to give disproportionately | |
1537 | - large performance improvements. I imagine _\bb_\bz_\bi_\bp_\b2 will per- | |
1538 | - form best on machines with very large caches. | |
1539 | - | |
1540 | - | |
1541 | -C\bCA\bAV\bVE\bEA\bAT\bTS\bS | |
1542 | - I/O error messages are not as helpful as they could be. | |
1543 | - _\bb_\bz_\bi_\bp_\b2 tries hard to detect I/O errors and exit cleanly, | |
1544 | - but the details of what the problem is sometimes seem | |
1545 | - rather misleading. | |
1546 | - | |
1547 | - This manual page pertains to version 1.0 of _\bb_\bz_\bi_\bp_\b2_\b. Com- | |
1548 | - pressed data created by this version is entirely forwards | |
1549 | - and backwards compatible with the previous public | |
1550 | - releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the | |
1551 | - following exception: 0.9.0 and above can correctly decom- | |
1552 | - press multiple concatenated compressed files. 0.1pl2 can- | |
1553 | - not do this; it will stop after decompressing just the | |
1554 | - first file in the stream. | |
1555 | - | |
1556 | - _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br uses 32-bit integers to represent bit posi- | |
1557 | - tions in compressed files, so it cannot handle compressed | |
1558 | - files more than 512 megabytes long. This could easily be | |
1559 | - fixed. | |
1560 | - | |
1561 | - | |
1562 | -A\bAU\bUT\bTH\bHO\bOR\bR | |
1563 | - Julian Seward, jseward@acm.org. | |
1564 | - | |
1565 | - http://sourceware.cygnus.com/bzip2 | |
1566 | - http://www.muraroa.demon.co.uk | |
1567 | - | |
1568 | - The ideas embodied in _\bb_\bz_\bi_\bp_\b2 are due to (at least) the fol- | |
1569 | - lowing people: Michael Burrows and David Wheeler (for the | |
1570 | - block sorting transformation), David Wheeler (again, for | |
1571 | - the Huffman coder), Peter Fenwick (for the structured cod- | |
1572 | - ing model in the original _\bb_\bz_\bi_\bp_\b, and many refinements), and | |
1573 | - Alistair Moffat, Radford Neal and Ian Witten (for the | |
1574 | - arithmetic coder in the original _\bb_\bz_\bi_\bp_\b)_\b. I am much | |
1575 | - indebted for their help, support and advice. See the man- | |
1576 | - ual in the source distribution for pointers to sources of | |
1577 | - documentation. Christian von Roques encouraged me to look | |
1578 | - for faster sorting algorithms, so as to speed up compres- | |
1579 | - sion. Bela Lubkin encouraged me to improve the worst-case | |
1580 | - compression performance. Many people sent patches, helped | |
1581 | - with portability problems, lent machines, gave advice and | |
1582 | - were generally helpful. | |
1583 | - | |
1584 | - | |
1585 | - | |
1586 | - | |
1587 | - | |
1588 | - | |
1589 | - | |
1590 | - | |
1591 | - 7 | |
1592 | - | |
1593 | - | |
1594 | diff -Nru bzip2-1.0.1/bzless bzip2-1.0.1.new/bzless | |
1595 | --- bzip2-1.0.1/bzless Thu Jan 1 01:00:00 1970 | |
1596 | +++ bzip2-1.0.1.new/bzless Sat Jun 24 20:16:09 2000 | |
1597 | @@ -0,0 +1,2 @@ | |
1598 | +#!/bin/sh | |
906ef59a | 1599 | +%{_bindir}/bunzip2 -c "$@" | %{_bindir}/less |
d967e3ec | 1600 | diff -Nru bzip2-1.0.1/config.h.in bzip2-1.0.1.new/config.h.in |
1601 | --- bzip2-1.0.1/config.h.in Thu Jan 1 01:00:00 1970 | |
1602 | +++ bzip2-1.0.1.new/config.h.in Sat Jun 24 20:13:06 2000 | |
1603 | @@ -0,0 +1,17 @@ | |
1604 | +/* config.h.in. Generated automatically from configure.in by autoheader. */ | |
1605 | + | |
1606 | +/* Name of package */ | |
1607 | +#undef PACKAGE | |
1608 | + | |
1609 | +/* Version number of package */ | |
1610 | +#undef VERSION | |
1611 | + | |
1612 | +/* Number of bits in a file offset, on hosts where this is settable. */ | |
1613 | +#undef _FILE_OFFSET_BITS | |
1614 | + | |
1615 | +/* Define to make fseeko etc. visible, on some hosts. */ | |
1616 | +#undef _LARGEFILE_SOURCE | |
1617 | + | |
1618 | +/* Define for large files, on AIX-style hosts. */ | |
1619 | +#undef _LARGE_FILES | |
1620 | + | |
1621 | diff -Nru bzip2-1.0.1/configure.in bzip2-1.0.1.new/configure.in | |
1622 | --- bzip2-1.0.1/configure.in Thu Jan 1 01:00:00 1970 | |
1623 | +++ bzip2-1.0.1.new/configure.in Sat Jun 24 20:13:06 2000 | |
1624 | @@ -0,0 +1,10 @@ | |
1625 | +AC_INIT(bzip2.c) | |
1626 | +AM_INIT_AUTOMAKE(bzip2,1.0.1) | |
1627 | +AM_CONFIG_HEADER(config.h) | |
1628 | +AC_PROG_CC | |
1629 | +AM_PROG_LIBTOOL | |
1630 | +AC_PROG_LN_S | |
1631 | +AC_SYS_LARGEFILE | |
1632 | +AC_OUTPUT(Makefile | |
1633 | + doc/Makefile | |
1634 | + doc/pl/Makefile) | |
1635 | diff -Nru bzip2-1.0.1/crctable.c bzip2-1.0.1.new/crctable.c | |
1636 | --- bzip2-1.0.1/crctable.c Sat Jun 24 20:13:27 2000 | |
1637 | +++ bzip2-1.0.1.new/crctable.c Sat Jun 24 20:13:06 2000 | |
1638 | @@ -58,6 +58,10 @@ | |
1639 | For more information on these sources, see the manual. | |
1640 | --*/ | |
1641 | ||
1642 | +#ifdef HAVE_CONFIG_H | |
1643 | +#include <config.h> | |
1644 | +#endif | |
1645 | + | |
1646 | ||
1647 | #include "bzlib_private.h" | |
1648 | ||
1649 | diff -Nru bzip2-1.0.1/decompress.c bzip2-1.0.1.new/decompress.c | |
1650 | --- bzip2-1.0.1/decompress.c Sat Jun 24 20:13:27 2000 | |
1651 | +++ bzip2-1.0.1.new/decompress.c Sat Jun 24 20:13:06 2000 | |
1652 | @@ -58,6 +58,10 @@ | |
1653 | For more information on these sources, see the manual. | |
1654 | --*/ | |
1655 | ||
1656 | +#ifdef HAVE_CONFIG_H | |
1657 | +#include <config.h> | |
1658 | +#endif | |
1659 | + | |
1660 | ||
1661 | #include "bzlib_private.h" | |
1662 | ||
1663 | diff -Nru bzip2-1.0.1/dlltest.c bzip2-1.0.1.new/dlltest.c | |
1664 | --- bzip2-1.0.1/dlltest.c Sat Jun 24 20:13:27 2000 | |
1665 | +++ bzip2-1.0.1.new/dlltest.c Sat Jun 24 20:13:06 2000 | |
1666 | @@ -8,6 +8,10 @@ | |
1667 | usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]\r | |
1668 | */\r | |
1669 | \r | |
1670 | +#ifdef HAVE_CONFIG_H | |
1671 | +#include <config.h> | |
1672 | +#endif | |
1673 | + | |
1674 | #define BZ_IMPORT\r | |
1675 | #include <stdio.h>\r | |
1676 | #include <stdlib.h>\r | |
1677 | diff -Nru bzip2-1.0.1/doc/Makefile.am bzip2-1.0.1.new/doc/Makefile.am | |
1678 | --- bzip2-1.0.1/doc/Makefile.am Thu Jan 1 01:00:00 1970 | |
1679 | +++ bzip2-1.0.1.new/doc/Makefile.am Sat Jun 24 20:14:43 2000 | |
1680 | @@ -0,0 +1,5 @@ | |
1681 | + | |
1682 | +SUBDIRS = pl | |
1683 | + | |
1684 | +man_MANS = bzip2.1 bunzip2.1 bzcat.1 bzip2recover.1 | |
1685 | +#info_TEXINFOS = bzip2.texi | |
1686 | diff -Nru bzip2-1.0.1/doc/bunzip2.1 bzip2-1.0.1.new/doc/bunzip2.1 | |
1687 | --- bzip2-1.0.1/doc/bunzip2.1 Thu Jan 1 01:00:00 1970 | |
1688 | +++ bzip2-1.0.1.new/doc/bunzip2.1 Sat Jun 24 20:13:06 2000 | |
1689 | @@ -0,0 +1 @@ | |
1690 | +.so bzip2.1 | |
1691 | \ No newline at end of file | |
1692 | diff -Nru bzip2-1.0.1/doc/bzcat.1 bzip2-1.0.1.new/doc/bzcat.1 | |
1693 | --- bzip2-1.0.1/doc/bzcat.1 Thu Jan 1 01:00:00 1970 | |
1694 | +++ bzip2-1.0.1.new/doc/bzcat.1 Sat Jun 24 20:13:06 2000 | |
1695 | @@ -0,0 +1 @@ | |
1696 | +.so bzip2.1 | |
1697 | \ No newline at end of file | |
1698 | diff -Nru bzip2-1.0.1/doc/bzip2.1 bzip2-1.0.1.new/doc/bzip2.1 | |
1699 | --- bzip2-1.0.1/doc/bzip2.1 Thu Jan 1 01:00:00 1970 | |
1700 | +++ bzip2-1.0.1.new/doc/bzip2.1 Sat Jun 24 20:13:06 2000 | |
1701 | @@ -0,0 +1,439 @@ | |
1702 | +.PU | |
1703 | +.TH bzip2 1 | |
1704 | +.SH NAME | |
1705 | +bzip2, bunzip2 \- a block-sorting file compressor, v1.0 | |
1706 | +.br | |
1707 | +bzcat \- decompresses files to stdout | |
1708 | +.br | |
1709 | +bzip2recover \- recovers data from damaged bzip2 files | |
1710 | + | |
1711 | +.SH SYNOPSIS | |
1712 | +.ll +8 | |
1713 | +.B bzip2 | |
1714 | +.RB [ " \-cdfkqstvzVL123456789 " ] | |
1715 | +[ | |
1716 | +.I "filenames \&..." | |
1717 | +] | |
1718 | +.ll -8 | |
1719 | +.br | |
1720 | +.B bunzip2 | |
1721 | +.RB [ " \-fkvsVL " ] | |
1722 | +[ | |
1723 | +.I "filenames \&..." | |
1724 | +] | |
1725 | +.br | |
1726 | +.B bzcat | |
1727 | +.RB [ " \-s " ] | |
1728 | +[ | |
1729 | +.I "filenames \&..." | |
1730 | +] | |
1731 | +.br | |
1732 | +.B bzip2recover | |
1733 | +.I "filename" | |
1734 | + | |
1735 | +.SH DESCRIPTION | |
1736 | +.I bzip2 | |
1737 | +compresses files using the Burrows-Wheeler block sorting | |
1738 | +text compression algorithm, and Huffman coding. Compression is | |
1739 | +generally considerably better than that achieved by more conventional | |
1740 | +LZ77/LZ78-based compressors, and approaches the performance of the PPM | |
1741 | +family of statistical compressors. | |
1742 | + | |
1743 | +The command-line options are deliberately very similar to | |
1744 | +those of | |
1745 | +.I GNU gzip, | |
1746 | +but they are not identical. | |
1747 | + | |
1748 | +.I bzip2 | |
1749 | +expects a list of file names to accompany the | |
1750 | +command-line flags. Each file is replaced by a compressed version of | |
1751 | +itself, with the name "original_name.bz2". | |
1752 | +Each compressed file | |
1753 | +has the same modification date, permissions, and, when possible, | |
1754 | +ownership as the corresponding original, so that these properties can | |
1755 | +be correctly restored at decompression time. File name handling is | |
1756 | +naive in the sense that there is no mechanism for preserving original | |
1757 | +file names, permissions, ownerships or dates in filesystems which lack | |
1758 | +these concepts, or have serious file name length restrictions, such as | |
1759 | +MS-DOS. | |
1760 | + | |
1761 | +.I bzip2 | |
1762 | +and | |
1763 | +.I bunzip2 | |
1764 | +will by default not overwrite existing | |
1765 | +files. If you want this to happen, specify the \-f flag. | |
1766 | + | |
1767 | +If no file names are specified, | |
1768 | +.I bzip2 | |
1769 | +compresses from standard | |
1770 | +input to standard output. In this case, | |
1771 | +.I bzip2 | |
1772 | +will decline to | |
1773 | +write compressed output to a terminal, as this would be entirely | |
1774 | +incomprehensible and therefore pointless. | |
1775 | + | |
1776 | +.I bunzip2 | |
1777 | +(or | |
1778 | +.I bzip2 \-d) | |
1779 | +decompresses all | |
1780 | +specified files. Files which were not created by | |
1781 | +.I bzip2 | |
1782 | +will be detected and ignored, and a warning issued. | |
1783 | +.I bzip2 | |
1784 | +attempts to guess the filename for the decompressed file | |
1785 | +from that of the compressed file as follows: | |
1786 | + | |
1787 | + filename.bz2 becomes filename | |
1788 | + filename.bz becomes filename | |
1789 | + filename.tbz2 becomes filename.tar | |
1790 | + filename.tbz becomes filename.tar | |
1791 | + anyothername becomes anyothername.out | |
1792 | + | |
1793 | +If the file does not end in one of the recognised endings, | |
1794 | +.I .bz2, | |
1795 | +.I .bz, | |
1796 | +.I .tbz2 | |
1797 | +or | |
1798 | +.I .tbz, | |
1799 | +.I bzip2 | |
1800 | +complains that it cannot | |
1801 | +guess the name of the original file, and uses the original name | |
1802 | +with | |
1803 | +.I .out | |
1804 | +appended. | |
1805 | + | |
1806 | +As with compression, supplying no | |
1807 | +filenames causes decompression from | |
1808 | +standard input to standard output. | |
1809 | + | |
1810 | +.I bunzip2 | |
1811 | +will correctly decompress a file which is the | |
1812 | +concatenation of two or more compressed files. The result is the | |
1813 | +concatenation of the corresponding uncompressed files. Integrity | |
1814 | +testing (\-t) | |
1815 | +of concatenated | |
1816 | +compressed files is also supported. | |
1817 | + | |
1818 | +You can also compress or decompress files to the standard output by | |
1819 | +giving the \-c flag. Multiple files may be compressed and | |
1820 | +decompressed like this. The resulting outputs are fed sequentially to | |
1821 | +stdout. Compression of multiple files | |
1822 | +in this manner generates a stream | |
1823 | +containing multiple compressed file representations. Such a stream | |
1824 | +can be decompressed correctly only by | |
1825 | +.I bzip2 | |
1826 | +version 0.9.0 or | |
1827 | +later. Earlier versions of | |
1828 | +.I bzip2 | |
1829 | +will stop after decompressing | |
1830 | +the first file in the stream. | |
1831 | + | |
1832 | +.I bzcat | |
1833 | +(or | |
1834 | +.I bzip2 -dc) | |
1835 | +decompresses all specified files to | |
1836 | +the standard output. | |
1837 | + | |
1838 | +.I bzip2 | |
1839 | +will read arguments from the environment variables | |
1840 | +.I BZIP2 | |
1841 | +and | |
1842 | +.I BZIP, | |
1843 | +in that order, and will process them | |
1844 | +before any arguments read from the command line. This gives a | |
1845 | +convenient way to supply default arguments. | |
1846 | + | |
1847 | +Compression is always performed, even if the compressed | |
1848 | +file is slightly | |
1849 | +larger than the original. Files of less than about one hundred bytes | |
1850 | +tend to get larger, since the compression mechanism has a constant | |
1851 | +overhead in the region of 50 bytes. Random data (including the output | |
1852 | +of most file compressors) is coded at about 8.05 bits per byte, giving | |
1853 | +an expansion of around 0.5%. | |
1854 | + | |
1855 | +As a self-check for your protection, | |
1856 | +.I | |
1857 | +bzip2 | |
1858 | +uses 32-bit CRCs to | |
1859 | +make sure that the decompressed version of a file is identical to the | |
1860 | +original. This guards against corruption of the compressed data, and | |
1861 | +against undetected bugs in | |
1862 | +.I bzip2 | |
1863 | +(hopefully very unlikely). The | |
1864 | +chances of data corruption going undetected is microscopic, about one | |
1865 | +chance in four billion for each file processed. Be aware, though, that | |
1866 | +the check occurs upon decompression, so it can only tell you that | |
1867 | +something is wrong. It can't help you | |
1868 | +recover the original uncompressed | |
1869 | +data. You can use | |
1870 | +.I bzip2recover | |
1871 | +to try to recover data from | |
1872 | +damaged files. | |
1873 | + | |
1874 | +Return values: 0 for a normal exit, 1 for environmental problems (file | |
1875 | +not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt | |
1876 | +compressed file, 3 for an internal consistency error (eg, bug) which | |
1877 | +caused | |
1878 | +.I bzip2 | |
1879 | +to panic. | |
1880 | + | |
1881 | +.SH OPTIONS | |
1882 | +.TP | |
1883 | +.B \-c --stdout | |
1884 | +Compress or decompress to standard output. | |
1885 | +.TP | |
1886 | +.B \-d --decompress | |
1887 | +Force decompression. | |
1888 | +.I bzip2, | |
1889 | +.I bunzip2 | |
1890 | +and | |
1891 | +.I bzcat | |
1892 | +are | |
1893 | +really the same program, and the decision about what actions to take is | |
1894 | +done on the basis of which name is used. This flag overrides that | |
1895 | +mechanism, and forces | |
1896 | +.I bzip2 | |
1897 | +to decompress. | |
1898 | +.TP | |
1899 | +.B \-z --compress | |
1900 | +The complement to \-d: forces compression, regardless of the | |
1901 | +invokation name. | |
1902 | +.TP | |
1903 | +.B \-t --test | |
1904 | +Check integrity of the specified file(s), but don't decompress them. | |
1905 | +This really performs a trial decompression and throws away the result. | |
1906 | +.TP | |
1907 | +.B \-f --force | |
1908 | +Force overwrite of output files. Normally, | |
1909 | +.I bzip2 | |
1910 | +will not overwrite | |
1911 | +existing output files. Also forces | |
1912 | +.I bzip2 | |
1913 | +to break hard links | |
1914 | +to files, which it otherwise wouldn't do. | |
1915 | +.TP | |
1916 | +.B \-k --keep | |
1917 | +Keep (don't delete) input files during compression | |
1918 | +or decompression. | |
1919 | +.TP | |
1920 | +.B \-s --small | |
1921 | +Reduce memory usage, for compression, decompression and testing. Files | |
1922 | +are decompressed and tested using a modified algorithm which only | |
1923 | +requires 2.5 bytes per block byte. This means any file can be | |
1924 | +decompressed in 2300k of memory, albeit at about half the normal speed. | |
1925 | + | |
1926 | +During compression, \-s selects a block size of 200k, which limits | |
1927 | +memory use to around the same figure, at the expense of your compression | |
1928 | +ratio. In short, if your machine is low on memory (8 megabytes or | |
1929 | +less), use \-s for everything. See MEMORY MANAGEMENT below. | |
1930 | +.TP | |
1931 | +.B \-q --quiet | |
1932 | +Suppress non-essential warning messages. Messages pertaining to | |
1933 | +I/O errors and other critical events will not be suppressed. | |
1934 | +.TP | |
1935 | +.B \-v --verbose | |
1936 | +Verbose mode -- show the compression ratio for each file processed. | |
1937 | +Further \-v's increase the verbosity level, spewing out lots of | |
1938 | +information which is primarily of interest for diagnostic purposes. | |
1939 | +.TP | |
1940 | +.B \-L --license -V --version | |
1941 | +Display the software version, license terms and conditions. | |
1942 | +.TP | |
1943 | +.B \-1 to \-9 | |
1944 | +Set the block size to 100 k, 200 k .. 900 k when compressing. Has no | |
1945 | +effect when decompressing. See MEMORY MANAGEMENT below. | |
1946 | +.TP | |
1947 | +.B \-- | |
1948 | +Treats all subsequent arguments as file names, even if they start | |
1949 | +with a dash. This is so you can handle files with names beginning | |
1950 | +with a dash, for example: bzip2 \-- \-myfilename. | |
1951 | +.TP | |
1952 | +.B \--repetitive-fast --repetitive-best | |
1953 | +These flags are redundant in versions 0.9.5 and above. They provided | |
1954 | +some coarse control over the behaviour of the sorting algorithm in | |
1955 | +earlier versions, which was sometimes useful. 0.9.5 and above have an | |
1956 | +improved algorithm which renders these flags irrelevant. | |
1957 | + | |
1958 | +.SH MEMORY MANAGEMENT | |
1959 | +.I bzip2 | |
1960 | +compresses large files in blocks. The block size affects | |
1961 | +both the compression ratio achieved, and the amount of memory needed for | |
1962 | +compression and decompression. The flags \-1 through \-9 | |
1963 | +specify the block size to be 100,000 bytes through 900,000 bytes (the | |
1964 | +default) respectively. At decompression time, the block size used for | |
1965 | +compression is read from the header of the compressed file, and | |
1966 | +.I bunzip2 | |
1967 | +then allocates itself just enough memory to decompress | |
1968 | +the file. Since block sizes are stored in compressed files, it follows | |
1969 | +that the flags \-1 to \-9 are irrelevant to and so ignored | |
1970 | +during decompression. | |
1971 | + | |
1972 | +Compression and decompression requirements, | |
1973 | +in bytes, can be estimated as: | |
1974 | + | |
1975 | + Compression: 400k + ( 8 x block size ) | |
1976 | + | |
1977 | + Decompression: 100k + ( 4 x block size ), or | |
1978 | + 100k + ( 2.5 x block size ) | |
1979 | + | |
1980 | +Larger block sizes give rapidly diminishing marginal returns. Most of | |
1981 | +the compression comes from the first two or three hundred k of block | |
1982 | +size, a fact worth bearing in mind when using | |
1983 | +.I bzip2 | |
1984 | +on small machines. | |
1985 | +It is also important to appreciate that the decompression memory | |
1986 | +requirement is set at compression time by the choice of block size. | |
1987 | + | |
1988 | +For files compressed with the default 900k block size, | |
1989 | +.I bunzip2 | |
1990 | +will require about 3700 kbytes to decompress. To support decompression | |
1991 | +of any file on a 4 megabyte machine, | |
1992 | +.I bunzip2 | |
1993 | +has an option to | |
1994 | +decompress using approximately half this amount of memory, about 2300 | |
1995 | +kbytes. Decompression speed is also halved, so you should use this | |
1996 | +option only where necessary. The relevant flag is -s. | |
1997 | + | |
1998 | +In general, try and use the largest block size memory constraints allow, | |
1999 | +since that maximises the compression achieved. Compression and | |
2000 | +decompression speed are virtually unaffected by block size. | |
2001 | + | |
2002 | +Another significant point applies to files which fit in a single block | |
2003 | +-- that means most files you'd encounter using a large block size. The | |
2004 | +amount of real memory touched is proportional to the size of the file, | |
2005 | +since the file is smaller than a block. For example, compressing a file | |
2006 | +20,000 bytes long with the flag -9 will cause the compressor to | |
2007 | +allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560 | |
2008 | +kbytes of it. Similarly, the decompressor will allocate 3700k but only | |
2009 | +touch 100k + 20000 * 4 = 180 kbytes. | |
2010 | + | |
2011 | +Here is a table which summarises the maximum memory usage for different | |
2012 | +block sizes. Also recorded is the total compressed size for 14 files of | |
2013 | +the Calgary Text Compression Corpus totalling 3,141,622 bytes. This | |
2014 | +column gives some feel for how compression varies with block size. | |
2015 | +These figures tend to understate the advantage of larger block sizes for | |
2016 | +larger files, since the Corpus is dominated by smaller files. | |
2017 | + | |
2018 | + Compress Decompress Decompress Corpus | |
2019 | + Flag usage usage -s usage Size | |
2020 | + | |
2021 | + -1 1200k 500k 350k 914704 | |
2022 | + -2 2000k 900k 600k 877703 | |
2023 | + -3 2800k 1300k 850k 860338 | |
2024 | + -4 3600k 1700k 1100k 846899 | |
2025 | + -5 4400k 2100k 1350k 845160 | |
2026 | + -6 5200k 2500k 1600k 838626 | |
2027 | + -7 6100k 2900k 1850k 834096 | |
2028 | + -8 6800k 3300k 2100k 828642 | |
2029 | + -9 7600k 3700k 2350k 828642 | |
2030 | + | |
2031 | +.SH RECOVERING DATA FROM DAMAGED FILES | |
2032 | +.I bzip2 | |
2033 | +compresses files in blocks, usually 900kbytes long. Each | |
2034 | +block is handled independently. If a media or transmission error causes | |
2035 | +a multi-block .bz2 | |
2036 | +file to become damaged, it may be possible to | |
2037 | +recover data from the undamaged blocks in the file. | |
2038 | + | |
2039 | +The compressed representation of each block is delimited by a 48-bit | |
2040 | +pattern, which makes it possible to find the block boundaries with | |
2041 | +reasonable certainty. Each block also carries its own 32-bit CRC, so | |
2042 | +damaged blocks can be distinguished from undamaged ones. | |
2043 | + | |
2044 | +.I bzip2recover | |
2045 | +is a simple program whose purpose is to search for | |
2046 | +blocks in .bz2 files, and write each block out into its own .bz2 | |
2047 | +file. You can then use | |
2048 | +.I bzip2 | |
2049 | +\-t | |
2050 | +to test the | |
2051 | +integrity of the resulting files, and decompress those which are | |
2052 | +undamaged. | |
2053 | + | |
2054 | +.I bzip2recover | |
2055 | +takes a single argument, the name of the damaged file, | |
2056 | +and writes a number of files "rec0001file.bz2", | |
2057 | +"rec0002file.bz2", etc, containing the extracted blocks. | |
2058 | +The output filenames are designed so that the use of | |
2059 | +wildcards in subsequent processing -- for example, | |
2060 | +"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in | |
2061 | +the correct order. | |
2062 | + | |
2063 | +.I bzip2recover | |
2064 | +should be of most use dealing with large .bz2 | |
2065 | +files, as these will contain many blocks. It is clearly | |
2066 | +futile to use it on damaged single-block files, since a | |
2067 | +damaged block cannot be recovered. If you wish to minimise | |
2068 | +any potential data loss through media or transmission errors, | |
2069 | +you might consider compressing with a smaller | |
2070 | +block size. | |
2071 | + | |
2072 | +.SH PERFORMANCE NOTES | |
2073 | +The sorting phase of compression gathers together similar strings in the | |
2074 | +file. Because of this, files containing very long runs of repeated | |
2075 | +symbols, like "aabaabaabaab ..." (repeated several hundred times) may | |
2076 | +compress more slowly than normal. Versions 0.9.5 and above fare much | |
2077 | +better than previous versions in this respect. The ratio between | |
2078 | +worst-case and average-case compression time is in the region of 10:1. | |
2079 | +For previous versions, this figure was more like 100:1. You can use the | |
2080 | +\-vvvv option to monitor progress in great detail, if you want. | |
2081 | + | |
2082 | +Decompression speed is unaffected by these phenomena. | |
2083 | + | |
2084 | +.I bzip2 | |
2085 | +usually allocates several megabytes of memory to operate | |
2086 | +in, and then charges all over it in a fairly random fashion. This means | |
2087 | +that performance, both for compressing and decompressing, is largely | |
2088 | +determined by the speed at which your machine can service cache misses. | |
2089 | +Because of this, small changes to the code to reduce the miss rate have | |
2090 | +been observed to give disproportionately large performance improvements. | |
2091 | +I imagine | |
2092 | +.I bzip2 | |
2093 | +will perform best on machines with very large caches. | |
2094 | + | |
2095 | +.SH CAVEATS | |
2096 | +I/O error messages are not as helpful as they could be. | |
2097 | +.I bzip2 | |
2098 | +tries hard to detect I/O errors and exit cleanly, but the details of | |
2099 | +what the problem is sometimes seem rather misleading. | |
2100 | + | |
2101 | +This manual page pertains to version 1.0 of | |
2102 | +.I bzip2. | |
2103 | +Compressed | |
2104 | +data created by this version is entirely forwards and backwards | |
2105 | +compatible with the previous public releases, versions 0.1pl2, 0.9.0 | |
2106 | +and 0.9.5, | |
2107 | +but with the following exception: 0.9.0 and above can correctly | |
2108 | +decompress multiple concatenated compressed files. 0.1pl2 cannot do | |
2109 | +this; it will stop after decompressing just the first file in the | |
2110 | +stream. | |
2111 | + | |
2112 | +.I bzip2recover | |
2113 | +uses 32-bit integers to represent bit positions in | |
2114 | +compressed files, so it cannot handle compressed files more than 512 | |
2115 | +megabytes long. This could easily be fixed. | |
2116 | + | |
2117 | +.SH AUTHOR | |
2118 | +Julian Seward, jseward@acm.org. | |
2119 | + | |
2120 | +http://sourceware.cygnus.com/bzip2 | |
2121 | +http://www.muraroa.demon.co.uk | |
2122 | + | |
2123 | +The ideas embodied in | |
2124 | +.I bzip2 | |
2125 | +are due to (at least) the following | |
2126 | +people: Michael Burrows and David Wheeler (for the block sorting | |
2127 | +transformation), David Wheeler (again, for the Huffman coder), Peter | |
2128 | +Fenwick (for the structured coding model in the original | |
2129 | +.I bzip, | |
2130 | +and many refinements), and Alistair Moffat, Radford Neal and Ian Witten | |
2131 | +(for the arithmetic coder in the original | |
2132 | +.I bzip). | |
2133 | +I am much | |
2134 | +indebted for their help, support and advice. See the manual in the | |
2135 | +source distribution for pointers to sources of documentation. Christian | |
2136 | +von Roques encouraged me to look for faster sorting algorithms, so as to | |
2137 | +speed up compression. Bela Lubkin encouraged me to improve the | |
2138 | +worst-case compression performance. Many people sent patches, helped | |
2139 | +with portability problems, lent machines, gave advice and were generally | |
2140 | +helpful. | |
2141 | diff -Nru bzip2-1.0.1/doc/bzip2.texi bzip2-1.0.1.new/doc/bzip2.texi | |
2142 | --- bzip2-1.0.1/doc/bzip2.texi Thu Jan 1 01:00:00 1970 | |
2143 | +++ bzip2-1.0.1.new/doc/bzip2.texi Sat Jun 24 20:13:06 2000 | |
2144 | @@ -0,0 +1,2217 @@ | |
2145 | +\input texinfo @c -*- Texinfo -*- | |
2146 | +@setfilename bzip2.info | |
2147 | + | |
2148 | +@ignore | |
2149 | +This file documents bzip2 version 1.0, and associated library | |
2150 | +libbzip2, written by Julian Seward (jseward@acm.org). | |
2151 | + | |
2152 | +Copyright (C) 1996-2000 Julian R Seward | |
2153 | + | |
2154 | +Permission is granted to make and distribute verbatim copies of | |
2155 | +this manual provided the copyright notice and this permission notice | |
2156 | +are preserved on all copies. | |
2157 | + | |
2158 | +Permission is granted to copy and distribute translations of this manual | |
2159 | +into another language, under the above conditions for verbatim copies. | |
2160 | +@end ignore | |
2161 | + | |
2162 | +@ifinfo | |
2163 | +@format | |
2164 | +@dircategory File utilities: | |
2165 | +* Bzip2: (bzip2). A program and library for data | |
2166 | + compression | |
2167 | +@end direntry | |
2168 | +@end format | |
2169 | +@end ifinfo | |
2170 | + | |
2171 | +@iftex | |
2172 | +@c @finalout | |
2173 | +@settitle bzip2 and libbzip2 | |
2174 | +@titlepage | |
2175 | +@title bzip2 and libbzip2 | |
2176 | +@subtitle a program and library for data compression | |
2177 | +@subtitle copyright (C) 1996-2000 Julian Seward | |
2178 | +@subtitle version 1.0 of 21 March 2000 | |
2179 | +@author Julian Seward | |
2180 | + | |
2181 | +@end titlepage | |
2182 | + | |
2183 | +@parindent 0mm | |
2184 | +@parskip 2mm | |
2185 | + | |
2186 | +@end iftex | |
2187 | +@node Top, Overview, (dir), (dir) | |
2188 | + | |
2189 | +@top bzip2 | |
2190 | + | |
2191 | +This program, @code{bzip2}, | |
2192 | +and associated library @code{libbzip2}, are | |
2193 | +Copyright (C) 1996-2000 Julian R Seward. All rights reserved. | |
2194 | + | |
2195 | +Redistribution and use in source and binary forms, with or without | |
2196 | +modification, are permitted provided that the following conditions | |
2197 | +are met: | |
2198 | +@itemize @bullet | |
2199 | +@item | |
2200 | + Redistributions of source code must retain the above copyright | |
2201 | + notice, this list of conditions and the following disclaimer. | |
2202 | +@item | |
2203 | + The origin of this software must not be misrepresented; you must | |
2204 | + not claim that you wrote the original software. If you use this | |
2205 | + software in a product, an acknowledgment in the product | |
2206 | + documentation would be appreciated but is not required. | |
2207 | +@item | |
2208 | + Altered source versions must be plainly marked as such, and must | |
2209 | + not be misrepresented as being the original software. | |
2210 | +@item | |
2211 | + The name of the author may not be used to endorse or promote | |
2212 | + products derived from this software without specific prior written | |
2213 | + permission. | |
2214 | +@end itemize | |
2215 | +THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS | |
2216 | +OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
2217 | +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
2218 | +ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY | |
2219 | +DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
2220 | +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE | |
2221 | +GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
2222 | +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | |
2223 | +WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING | |
2224 | +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | |
2225 | +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
2226 | + | |
2227 | +Julian Seward, Cambridge, UK. | |
2228 | + | |
2229 | +@code{jseward@@acm.org} | |
2230 | + | |
2231 | +@code{http://sourceware.cygnus.com/bzip2} | |
2232 | + | |
2233 | +@code{http://www.cacheprof.org} | |
2234 | + | |
2235 | +@code{http://www.muraroa.demon.co.uk} | |
2236 | + | |
2237 | +@code{bzip2}/@code{libbzip2} version 1.0 of 21 March 2000. | |
2238 | + | |
2239 | +PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented | |
2240 | +algorithms. However, I do not have the resources available to carry out | |
2241 | +a full patent search. Therefore I cannot give any guarantee of the | |
2242 | +above statement. | |
2243 | + | |
2244 | + | |
2245 | + | |
2246 | + | |
2247 | + | |
2248 | + | |
2249 | + | |
2250 | +@node Overview, Implementation, Top, Top | |
2251 | +@chapter Introduction | |
2252 | + | |
2253 | +@code{bzip2} compresses files using the Burrows-Wheeler | |
2254 | +block-sorting text compression algorithm, and Huffman coding. | |
2255 | +Compression is generally considerably better than that | |
2256 | +achieved by more conventional LZ77/LZ78-based compressors, | |
2257 | +and approaches the performance of the PPM family of statistical compressors. | |
2258 | + | |
2259 | +@code{bzip2} is built on top of @code{libbzip2}, a flexible library | |
2260 | +for handling compressed data in the @code{bzip2} format. This manual | |
2261 | +describes both how to use the program and | |
2262 | +how to work with the library interface. Most of the | |
2263 | +manual is devoted to this library, not the program, | |
2264 | +which is good news if your interest is only in the program. | |
2265 | + | |
2266 | +Chapter 2 describes how to use @code{bzip2}; this is the only part | |
2267 | +you need to read if you just want to know how to operate the program. | |
2268 | +Chapter 3 describes the programming interfaces in detail, and | |
2269 | +Chapter 4 records some miscellaneous notes which I thought | |
2270 | +ought to be recorded somewhere. | |
2271 | + | |
2272 | + | |
2273 | +@chapter How to use @code{bzip2} | |
2274 | + | |
2275 | +This chapter contains a copy of the @code{bzip2} man page, | |
2276 | +and nothing else. | |
2277 | + | |
2278 | +@quotation | |
2279 | + | |
2280 | +@unnumberedsubsubsec NAME | |
2281 | +@itemize | |
2282 | +@item @code{bzip2}, @code{bunzip2} | |
2283 | +- a block-sorting file compressor, v1.0 | |
2284 | +@item @code{bzcat} | |
2285 | +- decompresses files to stdout | |
2286 | +@item @code{bzip2recover} | |
2287 | +- recovers data from damaged bzip2 files | |
2288 | +@end itemize | |
2289 | + | |
2290 | +@unnumberedsubsubsec SYNOPSIS | |
2291 | +@itemize | |
2292 | +@item @code{bzip2} [ -cdfkqstvzVL123456789 ] [ filenames ... ] | |
2293 | +@item @code{bunzip2} [ -fkvsVL ] [ filenames ... ] | |
2294 | +@item @code{bzcat} [ -s ] [ filenames ... ] | |
2295 | +@item @code{bzip2recover} filename | |
2296 | +@end itemize | |
2297 | + | |
2298 | +@unnumberedsubsubsec DESCRIPTION | |
2299 | + | |
2300 | +@code{bzip2} compresses files using the Burrows-Wheeler block sorting | |
2301 | +text compression algorithm, and Huffman coding. Compression is | |
2302 | +generally considerably better than that achieved by more conventional | |
2303 | +LZ77/LZ78-based compressors, and approaches the performance of the PPM | |
2304 | +family of statistical compressors. | |
2305 | + | |
2306 | +The command-line options are deliberately very similar to those of GNU | |
2307 | +@code{gzip}, but they are not identical. | |
2308 | + | |
2309 | +@code{bzip2} expects a list of file names to accompany the command-line | |
2310 | +flags. Each file is replaced by a compressed version of itself, with | |
2311 | +the name @code{original_name.bz2}. Each compressed file has the same | |
2312 | +modification date, permissions, and, when possible, ownership as the | |
2313 | +corresponding original, so that these properties can be correctly | |
2314 | +restored at decompression time. File name handling is naive in the | |
2315 | +sense that there is no mechanism for preserving original file names, | |
2316 | +permissions, ownerships or dates in filesystems which lack these | |
2317 | +concepts, or have serious file name length restrictions, such as MS-DOS. | |
2318 | + | |
2319 | +@code{bzip2} and @code{bunzip2} will by default not overwrite existing | |
2320 | +files. If you want this to happen, specify the @code{-f} flag. | |
2321 | + | |
2322 | +If no file names are specified, @code{bzip2} compresses from standard | |
2323 | +input to standard output. In this case, @code{bzip2} will decline to | |
2324 | +write compressed output to a terminal, as this would be entirely | |
2325 | +incomprehensible and therefore pointless. | |
2326 | + | |
2327 | +@code{bunzip2} (or @code{bzip2 -d}) decompresses all | |
2328 | +specified files. Files which were not created by @code{bzip2} | |
2329 | +will be detected and ignored, and a warning issued. | |
2330 | +@code{bzip2} attempts to guess the filename for the decompressed file | |
2331 | +from that of the compressed file as follows: | |
2332 | +@itemize | |
2333 | +@item @code{filename.bz2 } becomes @code{filename} | |
2334 | +@item @code{filename.bz } becomes @code{filename} | |
2335 | +@item @code{filename.tbz2} becomes @code{filename.tar} | |
2336 | +@item @code{filename.tbz } becomes @code{filename.tar} | |
2337 | +@item @code{anyothername } becomes @code{anyothername.out} | |
2338 | +@end itemize | |
2339 | +If the file does not end in one of the recognised endings, | |
2340 | +@code{.bz2}, @code{.bz}, | |
2341 | +@code{.tbz2} or @code{.tbz}, @code{bzip2} complains that it cannot | |
2342 | +guess the name of the original file, and uses the original name | |
2343 | +with @code{.out} appended. | |
2344 | + | |
2345 | +As with compression, supplying no | |
2346 | +filenames causes decompression from standard input to standard output. | |
2347 | + | |
2348 | +@code{bunzip2} will correctly decompress a file which is the | |
2349 | +concatenation of two or more compressed files. The result is the | |
2350 | +concatenation of the corresponding uncompressed files. Integrity | |
2351 | +testing (@code{-t}) of concatenated compressed files is also supported. | |
2352 | + | |
2353 | +You can also compress or decompress files to the standard output by | |
2354 | +giving the @code{-c} flag. Multiple files may be compressed and | |
2355 | +decompressed like this. The resulting outputs are fed sequentially to | |
2356 | +stdout. Compression of multiple files in this manner generates a stream | |
2357 | +containing multiple compressed file representations. Such a stream | |
2358 | +can be decompressed correctly only by @code{bzip2} version 0.9.0 or | |
2359 | +later. Earlier versions of @code{bzip2} will stop after decompressing | |
2360 | +the first file in the stream. | |
2361 | + | |
2362 | +@code{bzcat} (or @code{bzip2 -dc}) decompresses all specified files to | |
2363 | +the standard output. | |
2364 | + | |
2365 | +@code{bzip2} will read arguments from the environment variables | |
2366 | +@code{BZIP2} and @code{BZIP}, in that order, and will process them | |
2367 | +before any arguments read from the command line. This gives a | |
2368 | +convenient way to supply default arguments. | |
2369 | + | |
2370 | +Compression is always performed, even if the compressed file is slightly | |
2371 | +larger than the original. Files of less than about one hundred bytes | |
2372 | +tend to get larger, since the compression mechanism has a constant | |
2373 | +overhead in the region of 50 bytes. Random data (including the output | |
2374 | +of most file compressors) is coded at about 8.05 bits per byte, giving | |
2375 | +an expansion of around 0.5%. | |
2376 | + | |
2377 | +As a self-check for your protection, @code{bzip2} uses 32-bit CRCs to | |
2378 | +make sure that the decompressed version of a file is identical to the | |
2379 | +original. This guards against corruption of the compressed data, and | |
2380 | +against undetected bugs in @code{bzip2} (hopefully very unlikely). The | |
2381 | +chances of data corruption going undetected is microscopic, about one | |
2382 | +chance in four billion for each file processed. Be aware, though, that | |
2383 | +the check occurs upon decompression, so it can only tell you that | |
2384 | +something is wrong. It can't help you recover the original uncompressed | |
2385 | +data. You can use @code{bzip2recover} to try to recover data from | |
2386 | +damaged files. | |
2387 | + | |
2388 | +Return values: 0 for a normal exit, 1 for environmental problems (file | |
2389 | +not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt | |
2390 | +compressed file, 3 for an internal consistency error (eg, bug) which | |
2391 | +caused @code{bzip2} to panic. | |
2392 | + | |
2393 | + | |
2394 | +@unnumberedsubsubsec OPTIONS | |
2395 | +@table @code | |
2396 | +@item -c --stdout | |
2397 | +Compress or decompress to standard output. | |
2398 | +@item -d --decompress | |
2399 | +Force decompression. @code{bzip2}, @code{bunzip2} and @code{bzcat} are | |
2400 | +really the same program, and the decision about what actions to take is | |
2401 | +done on the basis of which name is used. This flag overrides that | |
2402 | +mechanism, and forces bzip2 to decompress. | |
2403 | +@item -z --compress | |
2404 | +The complement to @code{-d}: forces compression, regardless of the | |
2405 | +invokation name. | |
2406 | +@item -t --test | |
2407 | +Check integrity of the specified file(s), but don't decompress them. | |
2408 | +This really performs a trial decompression and throws away the result. | |
2409 | +@item -f --force | |
2410 | +Force overwrite of output files. Normally, @code{bzip2} will not overwrite | |
2411 | +existing output files. Also forces @code{bzip2} to break hard links | |
2412 | +to files, which it otherwise wouldn't do. | |
2413 | +@item -k --keep | |
2414 | +Keep (don't delete) input files during compression | |
2415 | +or decompression. | |
2416 | +@item -s --small | |
2417 | +Reduce memory usage, for compression, decompression and testing. Files | |
2418 | +are decompressed and tested using a modified algorithm which only | |
2419 | +requires 2.5 bytes per block byte. This means any file can be | |
2420 | +decompressed in 2300k of memory, albeit at about half the normal speed. | |
2421 | + | |
2422 | +During compression, @code{-s} selects a block size of 200k, which limits | |
2423 | +memory use to around the same figure, at the expense of your compression | |
2424 | +ratio. In short, if your machine is low on memory (8 megabytes or | |
2425 | +less), use -s for everything. See MEMORY MANAGEMENT below. | |
2426 | +@item -q --quiet | |
2427 | +Suppress non-essential warning messages. Messages pertaining to | |
2428 | +I/O errors and other critical events will not be suppressed. | |
2429 | +@item -v --verbose | |
2430 | +Verbose mode -- show the compression ratio for each file processed. | |
2431 | +Further @code{-v}'s increase the verbosity level, spewing out lots of | |
2432 | +information which is primarily of interest for diagnostic purposes. | |
2433 | +@item -L --license -V --version | |
2434 | +Display the software version, license terms and conditions. | |
2435 | +@item -1 to -9 | |
2436 | +Set the block size to 100 k, 200 k .. 900 k when compressing. Has no | |
2437 | +effect when decompressing. See MEMORY MANAGEMENT below. | |
2438 | +@item -- | |
2439 | +Treats all subsequent arguments as file names, even if they start | |
2440 | +with a dash. This is so you can handle files with names beginning | |
2441 | +with a dash, for example: @code{bzip2 -- -myfilename}. | |
2442 | +@item --repetitive-fast | |
2443 | +@item --repetitive-best | |
2444 | +These flags are redundant in versions 0.9.5 and above. They provided | |
2445 | +some coarse control over the behaviour of the sorting algorithm in | |
2446 | +earlier versions, which was sometimes useful. 0.9.5 and above have an | |
2447 | +improved algorithm which renders these flags irrelevant. | |
2448 | +@end table | |
2449 | + | |
2450 | + | |
2451 | +@unnumberedsubsubsec MEMORY MANAGEMENT | |
2452 | + | |
2453 | +@code{bzip2} compresses large files in blocks. The block size affects | |
2454 | +both the compression ratio achieved, and the amount of memory needed for | |
2455 | +compression and decompression. The flags @code{-1} through @code{-9} | |
2456 | +specify the block size to be 100,000 bytes through 900,000 bytes (the | |
2457 | +default) respectively. At decompression time, the block size used for | |
2458 | +compression is read from the header of the compressed file, and | |
2459 | +@code{bunzip2} then allocates itself just enough memory to decompress | |
2460 | +the file. Since block sizes are stored in compressed files, it follows | |
2461 | +that the flags @code{-1} to @code{-9} are irrelevant to and so ignored | |
2462 | +during decompression. | |
2463 | + | |
2464 | +Compression and decompression requirements, in bytes, can be estimated | |
2465 | +as: | |
2466 | +@example | |
2467 | + Compression: 400k + ( 8 x block size ) | |
2468 | + | |
2469 | + Decompression: 100k + ( 4 x block size ), or | |
2470 | + 100k + ( 2.5 x block size ) | |
2471 | +@end example | |
2472 | +Larger block sizes give rapidly diminishing marginal returns. Most of | |
2473 | +the compression comes from the first two or three hundred k of block | |
2474 | +size, a fact worth bearing in mind when using @code{bzip2} on small machines. | |
2475 | +It is also important to appreciate that the decompression memory | |
2476 | +requirement is set at compression time by the choice of block size. | |
2477 | + | |
2478 | +For files compressed with the default 900k block size, @code{bunzip2} | |
2479 | +will require about 3700 kbytes to decompress. To support decompression | |
2480 | +of any file on a 4 megabyte machine, @code{bunzip2} has an option to | |
2481 | +decompress using approximately half this amount of memory, about 2300 | |
2482 | +kbytes. Decompression speed is also halved, so you should use this | |
2483 | +option only where necessary. The relevant flag is @code{-s}. | |
2484 | + | |
2485 | +In general, try and use the largest block size memory constraints allow, | |
2486 | +since that maximises the compression achieved. Compression and | |
2487 | +decompression speed are virtually unaffected by block size. | |
2488 | + | |
2489 | +Another significant point applies to files which fit in a single block | |
2490 | +-- that means most files you'd encounter using a large block size. The | |
2491 | +amount of real memory touched is proportional to the size of the file, | |
2492 | +since the file is smaller than a block. For example, compressing a file | |
2493 | +20,000 bytes long with the flag @code{-9} will cause the compressor to | |
2494 | +allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560 | |
2495 | +kbytes of it. Similarly, the decompressor will allocate 3700k but only | |
2496 | +touch 100k + 20000 * 4 = 180 kbytes. | |
2497 | + | |
2498 | +Here is a table which summarises the maximum memory usage for different | |
2499 | +block sizes. Also recorded is the total compressed size for 14 files of | |
2500 | +the Calgary Text Compression Corpus totalling 3,141,622 bytes. This | |
2501 | +column gives some feel for how compression varies with block size. | |
2502 | +These figures tend to understate the advantage of larger block sizes for | |
2503 | +larger files, since the Corpus is dominated by smaller files. | |
2504 | +@example | |
2505 | + Compress Decompress Decompress Corpus | |
2506 | + Flag usage usage -s usage Size | |
2507 | + | |
2508 | + -1 1200k 500k 350k 914704 | |
2509 | + -2 2000k 900k 600k 877703 | |
2510 | + -3 2800k 1300k 850k 860338 | |
2511 | + -4 3600k 1700k 1100k 846899 | |
2512 | + -5 4400k 2100k 1350k 845160 | |
2513 | + -6 5200k 2500k 1600k 838626 | |
2514 | + -7 6100k 2900k 1850k 834096 | |
2515 | + -8 6800k 3300k 2100k 828642 | |
2516 | + -9 7600k 3700k 2350k 828642 | |
2517 | +@end example | |
2518 | + | |
2519 | +@unnumberedsubsubsec RECOVERING DATA FROM DAMAGED FILES | |
2520 | + | |
2521 | +@code{bzip2} compresses files in blocks, usually 900kbytes long. Each | |
2522 | +block is handled independently. If a media or transmission error causes | |
2523 | +a multi-block @code{.bz2} file to become damaged, it may be possible to | |
2524 | +recover data from the undamaged blocks in the file. | |
2525 | + | |
2526 | +The compressed representation of each block is delimited by a 48-bit | |
2527 | +pattern, which makes it possible to find the block boundaries with | |
2528 | +reasonable certainty. Each block also carries its own 32-bit CRC, so | |
2529 | +damaged blocks can be distinguished from undamaged ones. | |
2530 | + | |
2531 | +@code{bzip2recover} is a simple program whose purpose is to search for | |
2532 | +blocks in @code{.bz2} files, and write each block out into its own | |
2533 | +@code{.bz2} file. You can then use @code{bzip2 -t} to test the | |
2534 | +integrity of the resulting files, and decompress those which are | |
2535 | +undamaged. | |
2536 | + | |
2537 | +@code{bzip2recover} | |
2538 | +takes a single argument, the name of the damaged file, | |
2539 | +and writes a number of files @code{rec0001file.bz2}, | |
2540 | + @code{rec0002file.bz2}, etc, containing the extracted blocks. | |
2541 | + The output filenames are designed so that the use of | |
2542 | + wildcards in subsequent processing -- for example, | |
2543 | +@code{bzip2 -dc rec*file.bz2 > recovered_data} -- lists the files in | |
2544 | + the correct order. | |
2545 | + | |
2546 | +@code{bzip2recover} should be of most use dealing with large @code{.bz2} | |
2547 | + files, as these will contain many blocks. It is clearly | |
2548 | + futile to use it on damaged single-block files, since a | |
2549 | + damaged block cannot be recovered. If you wish to minimise | |
2550 | +any potential data loss through media or transmission errors, | |
2551 | +you might consider compressing with a smaller | |
2552 | + block size. | |
2553 | + | |
2554 | + | |
2555 | +@unnumberedsubsubsec PERFORMANCE NOTES | |
2556 | + | |
2557 | +The sorting phase of compression gathers together similar strings in the | |
2558 | +file. Because of this, files containing very long runs of repeated | |
2559 | +symbols, like "aabaabaabaab ..." (repeated several hundred times) may | |
2560 | +compress more slowly than normal. Versions 0.9.5 and above fare much | |
2561 | +better than previous versions in this respect. The ratio between | |
2562 | +worst-case and average-case compression time is in the region of 10:1. | |
2563 | +For previous versions, this figure was more like 100:1. You can use the | |
2564 | +@code{-vvvv} option to monitor progress in great detail, if you want. | |
2565 | + | |
2566 | +Decompression speed is unaffected by these phenomena. | |
2567 | + | |
2568 | +@code{bzip2} usually allocates several megabytes of memory to operate | |
2569 | +in, and then charges all over it in a fairly random fashion. This means | |
2570 | +that performance, both for compressing and decompressing, is largely | |
2571 | +determined by the speed at which your machine can service cache misses. | |
2572 | +Because of this, small changes to the code to reduce the miss rate have | |
2573 | +been observed to give disproportionately large performance improvements. | |
2574 | +I imagine @code{bzip2} will perform best on machines with very large | |
2575 | +caches. | |
2576 | + | |
2577 | + | |
2578 | +@unnumberedsubsubsec CAVEATS | |
2579 | + | |
2580 | +I/O error messages are not as helpful as they could be. @code{bzip2} | |
2581 | +tries hard to detect I/O errors and exit cleanly, but the details of | |
2582 | +what the problem is sometimes seem rather misleading. | |
2583 | + | |
2584 | +This manual page pertains to version 1.0 of @code{bzip2}. Compressed | |
2585 | +data created by this version is entirely forwards and backwards | |
2586 | +compatible with the previous public releases, versions 0.1pl2, 0.9.0 and | |
2587 | +0.9.5, but with the following exception: 0.9.0 and above can correctly | |
2588 | +decompress multiple concatenated compressed files. 0.1pl2 cannot do | |
2589 | +this; it will stop after decompressing just the first file in the | |
2590 | +stream. | |
2591 | + | |
2592 | +@code{bzip2recover} uses 32-bit integers to represent bit positions in | |
2593 | +compressed files, so it cannot handle compressed files more than 512 | |
2594 | +megabytes long. This could easily be fixed. | |
2595 | + | |
2596 | + | |
2597 | +@unnumberedsubsubsec AUTHOR | |
2598 | +Julian Seward, @code{jseward@@acm.org}. | |
2599 | + | |
2600 | +The ideas embodied in @code{bzip2} are due to (at least) the following | |
2601 | +people: Michael Burrows and David Wheeler (for the block sorting | |
2602 | +transformation), David Wheeler (again, for the Huffman coder), Peter | |
2603 | +Fenwick (for the structured coding model in the original @code{bzip}, | |
2604 | +and many refinements), and Alistair Moffat, Radford Neal and Ian Witten | |
2605 | +(for the arithmetic coder in the original @code{bzip}). I am much | |
2606 | +indebted for their help, support and advice. See the manual in the | |
2607 | +source distribution for pointers to sources of documentation. Christian | |
2608 | +von Roques encouraged me to look for faster sorting algorithms, so as to | |
2609 | +speed up compression. Bela Lubkin encouraged me to improve the | |
2610 | +worst-case compression performance. Many people sent patches, helped | |
2611 | +with portability problems, lent machines, gave advice and were generally | |
2612 | +helpful. | |
2613 | + | |
2614 | +@end quotation | |
2615 | + | |
2616 | + | |
2617 | + | |
2618 | + | |
2619 | +@chapter Programming with @code{libbzip2} | |
2620 | + | |
2621 | +This chapter describes the programming interface to @code{libbzip2}. | |
2622 | + | |
2623 | +For general background information, particularly about memory | |
2624 | +use and performance aspects, you'd be well advised to read Chapter 2 | |
2625 | +as well. | |
2626 | + | |
2627 | +@section Top-level structure | |
2628 | + | |
2629 | +@code{libbzip2} is a flexible library for compressing and decompressing | |
2630 | +data in the @code{bzip2} data format. Although packaged as a single | |
2631 | +entity, it helps to regard the library as three separate parts: the low | |
2632 | +level interface, and the high level interface, and some utility | |
2633 | +functions. | |
2634 | + | |
2635 | +The structure of @code{libbzip2}'s interfaces is similar to | |
2636 | +that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib} | |
2637 | +library. | |
2638 | + | |
2639 | +All externally visible symbols have names beginning @code{BZ2_}. | |
2640 | +This is new in version 1.0. The intention is to minimise pollution | |
2641 | +of the namespaces of library clients. | |
2642 | + | |
2643 | +@subsection Low-level summary | |
2644 | + | |
2645 | +This interface provides services for compressing and decompressing | |
2646 | +data in memory. There's no provision for dealing with files, streams | |
2647 | +or any other I/O mechanisms, just straight memory-to-memory work. | |
2648 | +In fact, this part of the library can be compiled without inclusion | |
2649 | +of @code{stdio.h}, which may be helpful for embedded applications. | |
2650 | + | |
2651 | +The low-level part of the library has no global variables and | |
2652 | +is therefore thread-safe. | |
2653 | + | |
2654 | +Six routines make up the low level interface: | |
2655 | +@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, and @* @code{BZ2_bzCompressEnd} | |
2656 | +for compression, | |
2657 | +and a corresponding trio @code{BZ2_bzDecompressInit}, @* @code{BZ2_bzDecompress} | |
2658 | +and @code{BZ2_bzDecompressEnd} for decompression. | |
2659 | +The @code{*Init} functions allocate | |
2660 | +memory for compression/decompression and do other | |
2661 | +initialisations, whilst the @code{*End} functions close down operations | |
2662 | +and release memory. | |
2663 | + | |
2664 | +The real work is done by @code{BZ2_bzCompress} and @code{BZ2_bzDecompress}. | |
2665 | +These compress and decompress data from a user-supplied input buffer | |
2666 | +to a user-supplied output buffer. These buffers can be any size; | |
2667 | +arbitrary quantities of data are handled by making repeated calls | |
2668 | +to these functions. This is a flexible mechanism allowing a | |
2669 | +consumer-pull style of activity, or producer-push, or a mixture of | |
2670 | +both. | |
2671 | + | |
2672 | + | |
2673 | + | |
2674 | +@subsection High-level summary | |
2675 | + | |
2676 | +This interface provides some handy wrappers around the low-level | |
2677 | +interface to facilitate reading and writing @code{bzip2} format | |
2678 | +files (@code{.bz2} files). The routines provide hooks to facilitate | |
2679 | +reading files in which the @code{bzip2} data stream is embedded | |
2680 | +within some larger-scale file structure, or where there are | |
2681 | +multiple @code{bzip2} data streams concatenated end-to-end. | |
2682 | + | |
2683 | +For reading files, @code{BZ2_bzReadOpen}, @code{BZ2_bzRead}, | |
2684 | +@code{BZ2_bzReadClose} and @* @code{BZ2_bzReadGetUnused} are supplied. For | |
2685 | +writing files, @code{BZ2_bzWriteOpen}, @code{BZ2_bzWrite} and | |
2686 | +@code{BZ2_bzWriteFinish} are available. | |
2687 | + | |
2688 | +As with the low-level library, no global variables are used | |
2689 | +so the library is per se thread-safe. However, if I/O errors | |
2690 | +occur whilst reading or writing the underlying compressed files, | |
2691 | +you may have to consult @code{errno} to determine the cause of | |
2692 | +the error. In that case, you'd need a C library which correctly | |
2693 | +supports @code{errno} in a multithreaded environment. | |
2694 | + | |
2695 | +To make the library a little simpler and more portable, | |
2696 | +@code{BZ2_bzReadOpen} and @code{BZ2_bzWriteOpen} require you to pass them file | |
2697 | +handles (@code{FILE*}s) which have previously been opened for reading or | |
2698 | +writing respectively. That avoids portability problems associated with | |
2699 | +file operations and file attributes, whilst not being much of an | |
2700 | +imposition on the programmer. | |
2701 | + | |
2702 | + | |
2703 | + | |
2704 | +@subsection Utility functions summary | |
2705 | +For very simple needs, @code{BZ2_bzBuffToBuffCompress} and | |
2706 | +@code{BZ2_bzBuffToBuffDecompress} are provided. These compress | |
2707 | +data in memory from one buffer to another buffer in a single | |
2708 | +function call. You should assess whether these functions | |
2709 | +fulfill your memory-to-memory compression/decompression | |
2710 | +requirements before investing effort in understanding the more | |
2711 | +general but more complex low-level interface. | |
2712 | + | |
2713 | +Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} / | |
2714 | +@code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to | |
2715 | +give better @code{zlib} compatibility. These functions are | |
2716 | +@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush}, | |
2717 | +@code{BZ2_bzclose}, | |
2718 | +@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}. You may find these functions | |
2719 | +more convenient for simple file reading and writing, than those in the | |
2720 | +high-level interface. These functions are not (yet) officially part of | |
2721 | +the library, and are minimally documented here. If they break, you | |
2722 | +get to keep all the pieces. I hope to document them properly when time | |
2723 | +permits. | |
2724 | + | |
2725 | +Yoshioka also contributed modifications to allow the library to be | |
2726 | +built as a Windows DLL. | |
2727 | + | |
2728 | + | |
2729 | +@section Error handling | |
2730 | + | |
2731 | +The library is designed to recover cleanly in all situations, including | |
2732 | +the worst-case situation of decompressing random data. I'm not | |
2733 | +100% sure that it can always do this, so you might want to add | |
2734 | +a signal handler to catch segmentation violations during decompression | |
2735 | +if you are feeling especially paranoid. I would be interested in | |
2736 | +hearing more about the robustness of the library to corrupted | |
2737 | +compressed data. | |
2738 | + | |
2739 | +Version 1.0 is much more robust in this respect than | |
2740 | +0.9.0 or 0.9.5. Investigations with Checker (a tool for | |
2741 | +detecting problems with memory management, similar to Purify) | |
2742 | +indicate that, at least for the few files I tested, all single-bit | |
2743 | +errors in the decompressed data are caught properly, with no | |
2744 | +segmentation faults, no reads of uninitialised data and no | |
2745 | +out of range reads or writes. So it's certainly much improved, | |
2746 | +although I wouldn't claim it to be totally bombproof. | |
2747 | + | |
2748 | +The file @code{bzlib.h} contains all definitions needed to use | |
2749 | +the library. In particular, you should definitely not include | |
2750 | +@code{bzlib_private.h}. | |
2751 | + | |
2752 | +In @code{bzlib.h}, the various return values are defined. The following | |
2753 | +list is not intended as an exhaustive description of the circumstances | |
2754 | +in which a given value may be returned -- those descriptions are given | |
2755 | +later. Rather, it is intended to convey the rough meaning of each | |
2756 | +return value. The first five actions are normal and not intended to | |
2757 | +denote an error situation. | |
2758 | +@table @code | |
2759 | +@item BZ_OK | |
2760 | +The requested action was completed successfully. | |
2761 | +@item BZ_RUN_OK | |
2762 | +@itemx BZ_FLUSH_OK | |
2763 | +@itemx BZ_FINISH_OK | |
2764 | +In @code{BZ2_bzCompress}, the requested flush/finish/nothing-special action | |
2765 | +was completed successfully. | |
2766 | +@item BZ_STREAM_END | |
2767 | +Compression of data was completed, or the logical stream end was | |
2768 | +detected during decompression. | |
2769 | +@end table | |
2770 | + | |
2771 | +The following return values indicate an error of some kind. | |
2772 | +@table @code | |
2773 | +@item BZ_CONFIG_ERROR | |
2774 | +Indicates that the library has been improperly compiled on your | |
2775 | +platform -- a major configuration error. Specifically, it means | |
2776 | +that @code{sizeof(char)}, @code{sizeof(short)} and @code{sizeof(int)} | |
2777 | +are not 1, 2 and 4 respectively, as they should be. Note that the | |
2778 | +library should still work properly on 64-bit platforms which follow | |
2779 | +the LP64 programming model -- that is, where @code{sizeof(long)} | |
2780 | +and @code{sizeof(void*)} are 8. Under LP64, @code{sizeof(int)} is | |
2781 | +still 4, so @code{libbzip2}, which doesn't use the @code{long} type, | |
2782 | +is OK. | |
2783 | +@item BZ_SEQUENCE_ERROR | |
2784 | +When using the library, it is important to call the functions in the | |
2785 | +correct sequence and with data structures (buffers etc) in the correct | |
2786 | +states. @code{libbzip2} checks as much as it can to ensure this is | |
2787 | +happening, and returns @code{BZ_SEQUENCE_ERROR} if not. Code which | |
2788 | +complies precisely with the function semantics, as detailed below, | |
2789 | +should never receive this value; such an event denotes buggy code | |
2790 | +which you should investigate. | |
2791 | +@item BZ_PARAM_ERROR | |
2792 | +Returned when a parameter to a function call is out of range | |
2793 | +or otherwise manifestly incorrect. As with @code{BZ_SEQUENCE_ERROR}, | |
2794 | +this denotes a bug in the client code. The distinction between | |
2795 | +@code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth | |
2796 | +making. | |
2797 | +@item BZ_MEM_ERROR | |
2798 | +Returned when a request to allocate memory failed. Note that the | |
2799 | +quantity of memory needed to decompress a stream cannot be determined | |
2800 | +until the stream's header has been read. So @code{BZ2_bzDecompress} and | |
2801 | +@code{BZ2_bzRead} may return @code{BZ_MEM_ERROR} even though some of | |
2802 | +the compressed data has been read. The same is not true for | |
2803 | +compression; once @code{BZ2_bzCompressInit} or @code{BZ2_bzWriteOpen} have | |
2804 | +successfully completed, @code{BZ_MEM_ERROR} cannot occur. | |
2805 | +@item BZ_DATA_ERROR | |
2806 | +Returned when a data integrity error is detected during decompression. | |
2807 | +Most importantly, this means when stored and computed CRCs for the | |
2808 | +data do not match. This value is also returned upon detection of any | |
2809 | +other anomaly in the compressed data. | |
2810 | +@item BZ_DATA_ERROR_MAGIC | |
2811 | +As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to | |
2812 | +know when the compressed stream does not start with the correct | |
2813 | +magic bytes (@code{'B' 'Z' 'h'}). | |
2814 | +@item BZ_IO_ERROR | |
2815 | +Returned by @code{BZ2_bzRead} and @code{BZ2_bzWrite} when there is an error | |
2816 | +reading or writing in the compressed file, and by @code{BZ2_bzReadOpen} | |
2817 | +and @code{BZ2_bzWriteOpen} for attempts to use a file for which the | |
2818 | +error indicator (viz, @code{ferror(f)}) is set. | |
2819 | +On receipt of @code{BZ_IO_ERROR}, the caller should consult | |
2820 | +@code{errno} and/or @code{perror} to acquire operating-system | |
2821 | +specific information about the problem. | |
2822 | +@item BZ_UNEXPECTED_EOF | |
2823 | +Returned by @code{BZ2_bzRead} when the compressed file finishes | |
2824 | +before the logical end of stream is detected. | |
2825 | +@item BZ_OUTBUFF_FULL | |
2826 | +Returned by @code{BZ2_bzBuffToBuffCompress} and | |
2827 | +@code{BZ2_bzBuffToBuffDecompress} to indicate that the output data | |
2828 | +will not fit into the output buffer provided. | |
2829 | +@end table | |
2830 | + | |
2831 | + | |
2832 | + | |
2833 | +@section Low-level interface | |
2834 | + | |
2835 | +@subsection @code{BZ2_bzCompressInit} | |
2836 | +@example | |
2837 | +typedef | |
2838 | + struct @{ | |
2839 | + char *next_in; | |
2840 | + unsigned int avail_in; | |
2841 | + unsigned int total_in_lo32; | |
2842 | + unsigned int total_in_hi32; | |
2843 | + | |
2844 | + char *next_out; | |
2845 | + unsigned int avail_out; | |
2846 | + unsigned int total_out_lo32; | |
2847 | + unsigned int total_out_hi32; | |
2848 | + | |
2849 | + void *state; | |
2850 | + | |
2851 | + void *(*bzalloc)(void *,int,int); | |
2852 | + void (*bzfree)(void *,void *); | |
2853 | + void *opaque; | |
2854 | + @} | |
2855 | + bz_stream; | |
2856 | + | |
2857 | +int BZ2_bzCompressInit ( bz_stream *strm, | |
2858 | + int blockSize100k, | |
2859 | + int verbosity, | |
2860 | + int workFactor ); | |
2861 | + | |
2862 | +@end example | |
2863 | + | |
2864 | +Prepares for compression. The @code{bz_stream} structure | |
2865 | +holds all data pertaining to the compression activity. | |
2866 | +A @code{bz_stream} structure should be allocated and initialised | |
2867 | +prior to the call. | |
2868 | +The fields of @code{bz_stream} | |
2869 | +comprise the entirety of the user-visible data. @code{state} | |
2870 | +is a pointer to the private data structures required for compression. | |
2871 | + | |
2872 | +Custom memory allocators are supported, via fields @code{bzalloc}, | |
2873 | +@code{bzfree}, | |
2874 | +and @code{opaque}. The value | |
2875 | +@code{opaque} is passed to as the first argument to | |
2876 | +all calls to @code{bzalloc} and @code{bzfree}, but is | |
2877 | +otherwise ignored by the library. | |
2878 | +The call @code{bzalloc ( opaque, n, m )} is expected to return a | |
2879 | +pointer @code{p} to | |
2880 | +@code{n * m} bytes of memory, and @code{bzfree ( opaque, p )} | |
2881 | +should free | |
2882 | +that memory. | |
2883 | + | |
2884 | +If you don't want to use a custom memory allocator, set @code{bzalloc}, | |
2885 | +@code{bzfree} and | |
2886 | +@code{opaque} to @code{NULL}, | |
2887 | +and the library will then use the standard @code{malloc}/@code{free} | |
2888 | +routines. | |
2889 | + | |
2890 | +Before calling @code{BZ2_bzCompressInit}, fields @code{bzalloc}, | |
2891 | +@code{bzfree} and @code{opaque} should | |
2892 | +be filled appropriately, as just described. Upon return, the internal | |
2893 | +state will have been allocated and initialised, and @code{total_in_lo32}, | |
2894 | +@code{total_in_hi32}, @code{total_out_lo32} and | |
2895 | +@code{total_out_hi32} will have been set to zero. | |
2896 | +These four fields are used by the library | |
2897 | +to inform the caller of the total amount of data passed into and out of | |
2898 | +the library, respectively. You should not try to change them. | |
2899 | +As of version 1.0, 64-bit counts are maintained, even on 32-bit | |
2900 | +platforms, using the @code{_hi32} fields to store the upper 32 bits | |
2901 | +of the count. So, for example, the total amount of data in | |
2902 | +is @code{(total_in_hi32 << 32) + total_in_lo32}. | |
2903 | + | |
2904 | +Parameter @code{blockSize100k} specifies the block size to be used for | |
2905 | +compression. It should be a value between 1 and 9 inclusive, and the | |
2906 | +actual block size used is 100000 x this figure. 9 gives the best | |
2907 | +compression but takes most memory. | |
2908 | + | |
2909 | +Parameter @code{verbosity} should be set to a number between 0 and 4 | |
2910 | +inclusive. 0 is silent, and greater numbers give increasingly verbose | |
2911 | +monitoring/debugging output. If the library has been compiled with | |
2912 | +@code{-DBZ_NO_STDIO}, no such output will appear for any verbosity | |
2913 | +setting. | |
2914 | + | |
2915 | +Parameter @code{workFactor} controls how the compression phase behaves | |
2916 | +when presented with worst case, highly repetitive, input data. If | |
2917 | +compression runs into difficulties caused by repetitive data, the | |
2918 | +library switches from the standard sorting algorithm to a fallback | |
2919 | +algorithm. The fallback is slower than the standard algorithm by | |
2920 | +perhaps a factor of three, but always behaves reasonably, no matter how | |
2921 | +bad the input. | |
2922 | + | |
2923 | +Lower values of @code{workFactor} reduce the amount of effort the | |
2924 | +standard algorithm will expend before resorting to the fallback. You | |
2925 | +should set this parameter carefully; too low, and many inputs will be | |
2926 | +handled by the fallback algorithm and so compress rather slowly, too | |
2927 | +high, and your average-to-worst case compression times can become very | |
2928 | +large. The default value of 30 gives reasonable behaviour over a wide | |
2929 | +range of circumstances. | |
2930 | + | |
2931 | +Allowable values range from 0 to 250 inclusive. 0 is a special case, | |
2932 | +equivalent to using the default value of 30. | |
2933 | + | |
2934 | +Note that the compressed output generated is the same regardless of | |
2935 | +whether or not the fallback algorithm is used. | |
2936 | + | |
2937 | +Be aware also that this parameter may disappear entirely in future | |
2938 | +versions of the library. In principle it should be possible to devise a | |
2939 | +good way to automatically choose which algorithm to use. Such a | |
2940 | +mechanism would render the parameter obsolete. | |
2941 | + | |
2942 | +Possible return values: | |
2943 | +@display | |
2944 | + @code{BZ_CONFIG_ERROR} | |
2945 | + if the library has been mis-compiled | |
2946 | + @code{BZ_PARAM_ERROR} | |
2947 | + if @code{strm} is @code{NULL} | |
2948 | + or @code{blockSize} < 1 or @code{blockSize} > 9 | |
2949 | + or @code{verbosity} < 0 or @code{verbosity} > 4 | |
2950 | + or @code{workFactor} < 0 or @code{workFactor} > 250 | |
2951 | + @code{BZ_MEM_ERROR} | |
2952 | + if not enough memory is available | |
2953 | + @code{BZ_OK} | |
2954 | + otherwise | |
2955 | +@end display | |
2956 | +Allowable next actions: | |
2957 | +@display | |
2958 | + @code{BZ2_bzCompress} | |
2959 | + if @code{BZ_OK} is returned | |
2960 | + no specific action needed in case of error | |
2961 | +@end display | |
2962 | + | |
2963 | +@subsection @code{BZ2_bzCompress} | |
2964 | +@example | |
2965 | + int BZ2_bzCompress ( bz_stream *strm, int action ); | |
2966 | +@end example | |
2967 | +Provides more input and/or output buffer space for the library. The | |
2968 | +caller maintains input and output buffers, and calls @code{BZ2_bzCompress} to | |
2969 | +transfer data between them. | |
2970 | + | |
2971 | +Before each call to @code{BZ2_bzCompress}, @code{next_in} should point at | |
2972 | +the data to be compressed, and @code{avail_in} should indicate how many | |
2973 | +bytes the library may read. @code{BZ2_bzCompress} updates @code{next_in}, | |
2974 | +@code{avail_in} and @code{total_in} to reflect the number of bytes it | |
2975 | +has read. | |
2976 | + | |
2977 | +Similarly, @code{next_out} should point to a buffer in which the | |
2978 | +compressed data is to be placed, with @code{avail_out} indicating how | |
2979 | +much output space is available. @code{BZ2_bzCompress} updates | |
2980 | +@code{next_out}, @code{avail_out} and @code{total_out} to reflect the | |
2981 | +number of bytes output. | |
2982 | + | |
2983 | +You may provide and remove as little or as much data as you like on each | |
2984 | +call of @code{BZ2_bzCompress}. In the limit, it is acceptable to supply and | |
2985 | +remove data one byte at a time, although this would be terribly | |
2986 | +inefficient. You should always ensure that at least one byte of output | |
2987 | +space is available at each call. | |
2988 | + | |
2989 | +A second purpose of @code{BZ2_bzCompress} is to request a change of mode of the | |
2990 | +compressed stream. | |
2991 | + | |
2992 | +Conceptually, a compressed stream can be in one of four states: IDLE, | |
2993 | +RUNNING, FLUSHING and FINISHING. Before initialisation | |
2994 | +(@code{BZ2_bzCompressInit}) and after termination (@code{BZ2_bzCompressEnd}), a | |
2995 | +stream is regarded as IDLE. | |
2996 | + | |
2997 | +Upon initialisation (@code{BZ2_bzCompressInit}), the stream is placed in the | |
2998 | +RUNNING state. Subsequent calls to @code{BZ2_bzCompress} should pass | |
2999 | +@code{BZ_RUN} as the requested action; other actions are illegal and | |
3000 | +will result in @code{BZ_SEQUENCE_ERROR}. | |
3001 | + | |
3002 | +At some point, the calling program will have provided all the input data | |
3003 | +it wants to. It will then want to finish up -- in effect, asking the | |
3004 | +library to process any data it might have buffered internally. In this | |
3005 | +state, @code{BZ2_bzCompress} will no longer attempt to read data from | |
3006 | +@code{next_in}, but it will want to write data to @code{next_out}. | |
3007 | +Because the output buffer supplied by the user can be arbitrarily small, | |
3008 | +the finishing-up operation cannot necessarily be done with a single call | |
3009 | +of @code{BZ2_bzCompress}. | |
3010 | + | |
3011 | +Instead, the calling program passes @code{BZ_FINISH} as an action to | |
3012 | +@code{BZ2_bzCompress}. This changes the stream's state to FINISHING. Any | |
3013 | +remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and | |
3014 | +transferred to the output buffer. To do this, @code{BZ2_bzCompress} must be | |
3015 | +called repeatedly until all the output has been consumed. At that | |
3016 | +point, @code{BZ2_bzCompress} returns @code{BZ_STREAM_END}, and the stream's | |
3017 | +state is set back to IDLE. @code{BZ2_bzCompressEnd} should then be | |
3018 | +called. | |
3019 | + | |
3020 | +Just to make sure the calling program does not cheat, the library makes | |
3021 | +a note of @code{avail_in} at the time of the first call to | |
3022 | +@code{BZ2_bzCompress} which has @code{BZ_FINISH} as an action (ie, at the | |
3023 | +time the program has announced its intention to not supply any more | |
3024 | +input). By comparing this value with that of @code{avail_in} over | |
3025 | +subsequent calls to @code{BZ2_bzCompress}, the library can detect any | |
3026 | +attempts to slip in more data to compress. Any calls for which this is | |
3027 | +detected will return @code{BZ_SEQUENCE_ERROR}. This indicates a | |
3028 | +programming mistake which should be corrected. | |
3029 | + | |
3030 | +Instead of asking to finish, the calling program may ask | |
3031 | +@code{BZ2_bzCompress} to take all the remaining input, compress it and | |
3032 | +terminate the current (Burrows-Wheeler) compression block. This could | |
3033 | +be useful for error control purposes. The mechanism is analogous to | |
3034 | +that for finishing: call @code{BZ2_bzCompress} with an action of | |
3035 | +@code{BZ_FLUSH}, remove output data, and persist with the | |
3036 | +@code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned. As | |
3037 | +with finishing, @code{BZ2_bzCompress} detects any attempt to provide more | |
3038 | +input data once the flush has begun. | |
3039 | + | |
3040 | +Once the flush is complete, the stream returns to the normal RUNNING | |
3041 | +state. | |
3042 | + | |
3043 | +This all sounds pretty complex, but isn't really. Here's a table | |
3044 | +which shows which actions are allowable in each state, what action | |
3045 | +will be taken, what the next state is, and what the non-error return | |
3046 | +values are. Note that you can't explicitly ask what state the | |
3047 | +stream is in, but nor do you need to -- it can be inferred from the | |
3048 | +values returned by @code{BZ2_bzCompress}. | |
3049 | +@display | |
3050 | +IDLE/@code{any} | |
3051 | + Illegal. IDLE state only exists after @code{BZ2_bzCompressEnd} or | |
3052 | + before @code{BZ2_bzCompressInit}. | |
3053 | + Return value = @code{BZ_SEQUENCE_ERROR} | |
3054 | + | |
3055 | +RUNNING/@code{BZ_RUN} | |
3056 | + Compress from @code{next_in} to @code{next_out} as much as possible. | |
3057 | + Next state = RUNNING | |
3058 | + Return value = @code{BZ_RUN_OK} | |
3059 | + | |
3060 | +RUNNING/@code{BZ_FLUSH} | |
3061 | + Remember current value of @code{next_in}. Compress from @code{next_in} | |
3062 | + to @code{next_out} as much as possible, but do not accept any more input. | |
3063 | + Next state = FLUSHING | |
3064 | + Return value = @code{BZ_FLUSH_OK} | |
3065 | + | |
3066 | +RUNNING/@code{BZ_FINISH} | |
3067 | + Remember current value of @code{next_in}. Compress from @code{next_in} | |
3068 | + to @code{next_out} as much as possible, but do not accept any more input. | |
3069 | + Next state = FINISHING | |
3070 | + Return value = @code{BZ_FINISH_OK} | |
3071 | + | |
3072 | +FLUSHING/@code{BZ_FLUSH} | |
3073 | + Compress from @code{next_in} to @code{next_out} as much as possible, | |
3074 | + but do not accept any more input. | |
3075 | + If all the existing input has been used up and all compressed | |
3076 | + output has been removed | |
3077 | + Next state = RUNNING; Return value = @code{BZ_RUN_OK} | |
3078 | + else | |
3079 | + Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK} | |
3080 | + | |
3081 | +FLUSHING/other | |
3082 | + Illegal. | |
3083 | + Return value = @code{BZ_SEQUENCE_ERROR} | |
3084 | + | |
3085 | +FINISHING/@code{BZ_FINISH} | |
3086 | + Compress from @code{next_in} to @code{next_out} as much as possible, | |
3087 | + but to not accept any more input. | |
3088 | + If all the existing input has been used up and all compressed | |
3089 | + output has been removed | |
3090 | + Next state = IDLE; Return value = @code{BZ_STREAM_END} | |
3091 | + else | |
3092 | + Next state = FINISHING; Return value = @code{BZ_FINISHING} | |
3093 | + | |
3094 | +FINISHING/other | |
3095 | + Illegal. | |
3096 | + Return value = @code{BZ_SEQUENCE_ERROR} | |
3097 | +@end display | |
3098 | + | |
3099 | +That still looks complicated? Well, fair enough. The usual sequence | |
3100 | +of calls for compressing a load of data is: | |
3101 | +@itemize @bullet | |
3102 | +@item Get started with @code{BZ2_bzCompressInit}. | |
3103 | +@item Shovel data in and shlurp out its compressed form using zero or more | |
3104 | +calls of @code{BZ2_bzCompress} with action = @code{BZ_RUN}. | |
3105 | +@item Finish up. | |
3106 | +Repeatedly call @code{BZ2_bzCompress} with action = @code{BZ_FINISH}, | |
3107 | +copying out the compressed output, until @code{BZ_STREAM_END} is returned. | |
3108 | +@item Close up and go home. Call @code{BZ2_bzCompressEnd}. | |
3109 | +@end itemize | |
3110 | +If the data you want to compress fits into your input buffer all | |
3111 | +at once, you can skip the calls of @code{BZ2_bzCompress ( ..., BZ_RUN )} and | |
3112 | +just do the @code{BZ2_bzCompress ( ..., BZ_FINISH )} calls. | |
3113 | + | |
3114 | +All required memory is allocated by @code{BZ2_bzCompressInit}. The | |
3115 | +compression library can accept any data at all (obviously). So you | |
3116 | +shouldn't get any error return values from the @code{BZ2_bzCompress} calls. | |
3117 | +If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in | |
3118 | +your programming. | |
3119 | + | |
3120 | +Trivial other possible return values: | |
3121 | +@display | |
3122 | + @code{BZ_PARAM_ERROR} | |
3123 | + if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL} | |
3124 | +@end display | |
3125 | + | |
3126 | +@subsection @code{BZ2_bzCompressEnd} | |
3127 | +@example | |
3128 | +int BZ2_bzCompressEnd ( bz_stream *strm ); | |
3129 | +@end example | |
3130 | +Releases all memory associated with a compression stream. | |
3131 | + | |
3132 | +Possible return values: | |
3133 | +@display | |
3134 | + @code{BZ_PARAM_ERROR} if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL} | |
3135 | + @code{BZ_OK} otherwise | |
3136 | +@end display | |
3137 | + | |
3138 | + | |
3139 | +@subsection @code{BZ2_bzDecompressInit} | |
3140 | +@example | |
3141 | +int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small ); | |
3142 | +@end example | |
3143 | +Prepares for decompression. As with @code{BZ2_bzCompressInit}, a | |
3144 | +@code{bz_stream} record should be allocated and initialised before the | |
3145 | +call. Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be | |
3146 | +set if a custom memory allocator is required, or made @code{NULL} for | |
3147 | +the normal @code{malloc}/@code{free} routines. Upon return, the internal | |
3148 | +state will have been initialised, and @code{total_in} and | |
3149 | +@code{total_out} will be zero. | |
3150 | + | |
3151 | +For the meaning of parameter @code{verbosity}, see @code{BZ2_bzCompressInit}. | |
3152 | + | |
3153 | +If @code{small} is nonzero, the library will use an alternative | |
3154 | +decompression algorithm which uses less memory but at the cost of | |
3155 | +decompressing more slowly (roughly speaking, half the speed, but the | |
3156 | +maximum memory requirement drops to around 2300k). See Chapter 2 for | |
3157 | +more information on memory management. | |
3158 | + | |
3159 | +Note that the amount of memory needed to decompress | |
3160 | +a stream cannot be determined until the stream's header has been read, | |
3161 | +so even if @code{BZ2_bzDecompressInit} succeeds, a subsequent | |
3162 | +@code{BZ2_bzDecompress} could fail with @code{BZ_MEM_ERROR}. | |
3163 | + | |
3164 | +Possible return values: | |
3165 | +@display | |
3166 | + @code{BZ_CONFIG_ERROR} | |
3167 | + if the library has been mis-compiled | |
3168 | + @code{BZ_PARAM_ERROR} | |
3169 | + if @code{(small != 0 && small != 1)} | |
3170 | + or @code{(verbosity < 0 || verbosity > 4)} | |
3171 | + @code{BZ_MEM_ERROR} | |
3172 | + if insufficient memory is available | |
3173 | +@end display | |
3174 | + | |
3175 | +Allowable next actions: | |
3176 | +@display | |
3177 | + @code{BZ2_bzDecompress} | |
3178 | + if @code{BZ_OK} was returned | |
3179 | + no specific action required in case of error | |
3180 | +@end display | |
3181 | + | |
3182 | + | |
3183 | + | |
3184 | +@subsection @code{BZ2_bzDecompress} | |
3185 | +@example | |
3186 | +int BZ2_bzDecompress ( bz_stream *strm ); | |
3187 | +@end example | |
3188 | +Provides more input and/out output buffer space for the library. The | |
3189 | +caller maintains input and output buffers, and uses @code{BZ2_bzDecompress} | |
3190 | +to transfer data between them. | |
3191 | + | |
3192 | +Before each call to @code{BZ2_bzDecompress}, @code{next_in} | |
3193 | +should point at the compressed data, | |
3194 | +and @code{avail_in} should indicate how many bytes the library | |
3195 | +may read. @code{BZ2_bzDecompress} updates @code{next_in}, @code{avail_in} | |
3196 | +and @code{total_in} | |
3197 | +to reflect the number of bytes it has read. | |
3198 | + | |
3199 | +Similarly, @code{next_out} should point to a buffer in which the uncompressed | |
3200 | +output is to be placed, with @code{avail_out} indicating how much output space | |
3201 | +is available. @code{BZ2_bzCompress} updates @code{next_out}, | |
3202 | +@code{avail_out} and @code{total_out} to reflect | |
3203 | +the number of bytes output. | |
3204 | + | |
3205 | +You may provide and remove as little or as much data as you like on | |
3206 | +each call of @code{BZ2_bzDecompress}. | |
3207 | +In the limit, it is acceptable to | |
3208 | +supply and remove data one byte at a time, although this would be | |
3209 | +terribly inefficient. You should always ensure that at least one | |
3210 | +byte of output space is available at each call. | |
3211 | + | |
3212 | +Use of @code{BZ2_bzDecompress} is simpler than @code{BZ2_bzCompress}. | |
3213 | + | |
3214 | +You should provide input and remove output as described above, and | |
3215 | +repeatedly call @code{BZ2_bzDecompress} until @code{BZ_STREAM_END} is | |
3216 | +returned. Appearance of @code{BZ_STREAM_END} denotes that | |
3217 | +@code{BZ2_bzDecompress} has detected the logical end of the compressed | |
3218 | +stream. @code{BZ2_bzDecompress} will not produce @code{BZ_STREAM_END} until | |
3219 | +all output data has been placed into the output buffer, so once | |
3220 | +@code{BZ_STREAM_END} appears, you are guaranteed to have available all | |
3221 | +the decompressed output, and @code{BZ2_bzDecompressEnd} can safely be | |
3222 | +called. | |
3223 | + | |
3224 | +If case of an error return value, you should call @code{BZ2_bzDecompressEnd} | |
3225 | +to clean up and release memory. | |
3226 | + | |
3227 | +Possible return values: | |
3228 | +@display | |
3229 | + @code{BZ_PARAM_ERROR} | |
3230 | + if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL} | |
3231 | + or @code{strm->avail_out < 1} | |
3232 | + @code{BZ_DATA_ERROR} | |
3233 | + if a data integrity error is detected in the compressed stream | |
3234 | + @code{BZ_DATA_ERROR_MAGIC} | |
3235 | + if the compressed stream doesn't begin with the right magic bytes | |
3236 | + @code{BZ_MEM_ERROR} | |
3237 | + if there wasn't enough memory available | |
3238 | + @code{BZ_STREAM_END} | |
3239 | + if the logical end of the data stream was detected and all | |
3240 | + output in has been consumed, eg @code{s->avail_out > 0} | |
3241 | + @code{BZ_OK} | |
3242 | + otherwise | |
3243 | +@end display | |
3244 | +Allowable next actions: | |
3245 | +@display | |
3246 | + @code{BZ2_bzDecompress} | |
3247 | + if @code{BZ_OK} was returned | |
3248 | + @code{BZ2_bzDecompressEnd} | |
3249 | + otherwise | |
3250 | +@end display | |
3251 | + | |
3252 | + | |
3253 | +@subsection @code{BZ2_bzDecompressEnd} | |
3254 | +@example | |
3255 | +int BZ2_bzDecompressEnd ( bz_stream *strm ); | |
3256 | +@end example | |
3257 | +Releases all memory associated with a decompression stream. | |
3258 | + | |
3259 | +Possible return values: | |
3260 | +@display | |
3261 | + @code{BZ_PARAM_ERROR} | |
3262 | + if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL} | |
3263 | + @code{BZ_OK} | |
3264 | + otherwise | |
3265 | +@end display | |
3266 | + | |
3267 | +Allowable next actions: | |
3268 | +@display | |
3269 | + None. | |
3270 | +@end display | |
3271 | + | |
3272 | + | |
3273 | +@section High-level interface | |
3274 | + | |
3275 | +This interface provides functions for reading and writing | |
3276 | +@code{bzip2} format files. First, some general points. | |
3277 | + | |
3278 | +@itemize @bullet | |
3279 | +@item All of the functions take an @code{int*} first argument, | |
3280 | + @code{bzerror}. | |
3281 | + After each call, @code{bzerror} should be consulted first to determine | |
3282 | + the outcome of the call. If @code{bzerror} is @code{BZ_OK}, | |
3283 | + the call completed | |
3284 | + successfully, and only then should the return value of the function | |
3285 | + (if any) be consulted. If @code{bzerror} is @code{BZ_IO_ERROR}, | |
3286 | + there was an error | |
3287 | + reading/writing the underlying compressed file, and you should | |
3288 | + then consult @code{errno}/@code{perror} to determine the | |
3289 | + cause of the difficulty. | |
3290 | + @code{bzerror} may also be set to various other values; precise details are | |
3291 | + given on a per-function basis below. | |
3292 | +@item If @code{bzerror} indicates an error | |
3293 | + (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}), | |
3294 | + you should immediately call @code{BZ2_bzReadClose} (or @code{BZ2_bzWriteClose}, | |
3295 | + depending on whether you are attempting to read or to write) | |
3296 | + to free up all resources associated | |
3297 | + with the stream. Once an error has been indicated, behaviour of all calls | |
3298 | + except @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) is undefined. | |
3299 | + The implication is that (1) @code{bzerror} should | |
3300 | + be checked after each call, and (2) if @code{bzerror} indicates an error, | |
3301 | + @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) should then be called to clean up. | |
3302 | +@item The @code{FILE*} arguments passed to | |
3303 | + @code{BZ2_bzReadOpen}/@code{BZ2_bzWriteOpen} | |
3304 | + should be set to binary mode. | |
3305 | + Most Unix systems will do this by default, but other platforms, | |
3306 | + including Windows and Mac, will not. If you omit this, you may | |
3307 | + encounter problems when moving code to new platforms. | |
3308 | +@item Memory allocation requests are handled by | |
3309 | + @code{malloc}/@code{free}. | |
3310 | + At present | |
3311 | + there is no facility for user-defined memory allocators in the file I/O | |
3312 | + functions (could easily be added, though). | |
3313 | +@end itemize | |
3314 | + | |
3315 | + | |
3316 | + | |
3317 | +@subsection @code{BZ2_bzReadOpen} | |
3318 | +@example | |
3319 | + typedef void BZFILE; | |
3320 | + | |
3321 | + BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f, | |
3322 | + int small, int verbosity, | |
3323 | + void *unused, int nUnused ); | |
3324 | +@end example | |
3325 | +Prepare to read compressed data from file handle @code{f}. @code{f} | |
3326 | +should refer to a file which has been opened for reading, and for which | |
3327 | +the error indicator (@code{ferror(f)})is not set. If @code{small} is 1, | |
3328 | +the library will try to decompress using less memory, at the expense of | |
3329 | +speed. | |
3330 | + | |
3331 | +For reasons explained below, @code{BZ2_bzRead} will decompress the | |
3332 | +@code{nUnused} bytes starting at @code{unused}, before starting to read | |
3333 | +from the file @code{f}. At most @code{BZ_MAX_UNUSED} bytes may be | |
3334 | +supplied like this. If this facility is not required, you should pass | |
3335 | +@code{NULL} and @code{0} for @code{unused} and n@code{Unused} | |
3336 | +respectively. | |
3337 | + | |
3338 | +For the meaning of parameters @code{small} and @code{verbosity}, | |
3339 | +see @code{BZ2_bzDecompressInit}. | |
3340 | + | |
3341 | +The amount of memory needed to decompress a file cannot be determined | |
3342 | +until the file's header has been read. So it is possible that | |
3343 | +@code{BZ2_bzReadOpen} returns @code{BZ_OK} but a subsequent call of | |
3344 | +@code{BZ2_bzRead} will return @code{BZ_MEM_ERROR}. | |
3345 | + | |
3346 | +Possible assignments to @code{bzerror}: | |
3347 | +@display | |
3348 | + @code{BZ_CONFIG_ERROR} | |
3349 | + if the library has been mis-compiled | |
3350 | + @code{BZ_PARAM_ERROR} | |
3351 | + if @code{f} is @code{NULL} | |
3352 | + or @code{small} is neither @code{0} nor @code{1} | |
3353 | + or @code{(unused == NULL && nUnused != 0)} | |
3354 | + or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))} | |
3355 | + @code{BZ_IO_ERROR} | |
3356 | + if @code{ferror(f)} is nonzero | |
3357 | + @code{BZ_MEM_ERROR} | |
3358 | + if insufficient memory is available | |
3359 | + @code{BZ_OK} | |
3360 | + otherwise. | |
3361 | +@end display | |
3362 | + | |
3363 | +Possible return values: | |
3364 | +@display | |
3365 | + Pointer to an abstract @code{BZFILE} | |
3366 | + if @code{bzerror} is @code{BZ_OK} | |
3367 | + @code{NULL} | |
3368 | + otherwise | |
3369 | +@end display | |
3370 | + | |
3371 | +Allowable next actions: | |
3372 | +@display | |
3373 | + @code{BZ2_bzRead} | |
3374 | + if @code{bzerror} is @code{BZ_OK} | |
3375 | + @code{BZ2_bzClose} | |
3376 | + otherwise | |
3377 | +@end display | |
3378 | + | |
3379 | + | |
3380 | +@subsection @code{BZ2_bzRead} | |
3381 | +@example | |
3382 | + int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len ); | |
3383 | +@end example | |
3384 | +Reads up to @code{len} (uncompressed) bytes from the compressed file | |
3385 | +@code{b} into | |
3386 | +the buffer @code{buf}. If the read was successful, | |
3387 | +@code{bzerror} is set to @code{BZ_OK} | |
3388 | +and the number of bytes read is returned. If the logical end-of-stream | |
3389 | +was detected, @code{bzerror} will be set to @code{BZ_STREAM_END}, | |
3390 | +and the number | |
3391 | +of bytes read is returned. All other @code{bzerror} values denote an error. | |
3392 | + | |
3393 | +@code{BZ2_bzRead} will supply @code{len} bytes, | |
3394 | +unless the logical stream end is detected | |
3395 | +or an error occurs. Because of this, it is possible to detect the | |
3396 | +stream end by observing when the number of bytes returned is | |
3397 | +less than the number | |
3398 | +requested. Nevertheless, this is regarded as inadvisable; you should | |
3399 | +instead check @code{bzerror} after every call and watch out for | |
3400 | +@code{BZ_STREAM_END}. | |
3401 | + | |
3402 | +Internally, @code{BZ2_bzRead} copies data from the compressed file in chunks | |
3403 | +of size @code{BZ_MAX_UNUSED} bytes | |
3404 | +before decompressing it. If the file contains more bytes than strictly | |
3405 | +needed to reach the logical end-of-stream, @code{BZ2_bzRead} will almost certainly | |
3406 | +read some of the trailing data before signalling @code{BZ_SEQUENCE_END}. | |
3407 | +To collect the read but unused data once @code{BZ_SEQUENCE_END} has | |
3408 | +appeared, call @code{BZ2_bzReadGetUnused} immediately before @code{BZ2_bzReadClose}. | |
3409 | + | |
3410 | +Possible assignments to @code{bzerror}: | |
3411 | +@display | |
3412 | + @code{BZ_PARAM_ERROR} | |
3413 | + if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0} | |
3414 | + @code{BZ_SEQUENCE_ERROR} | |
3415 | + if @code{b} was opened with @code{BZ2_bzWriteOpen} | |
3416 | + @code{BZ_IO_ERROR} | |
3417 | + if there is an error reading from the compressed file | |
3418 | + @code{BZ_UNEXPECTED_EOF} | |
3419 | + if the compressed file ended before the logical end-of-stream was detected | |
3420 | + @code{BZ_DATA_ERROR} | |
3421 | + if a data integrity error was detected in the compressed stream | |
3422 | + @code{BZ_DATA_ERROR_MAGIC} | |
3423 | + if the stream does not begin with the requisite header bytes (ie, is not | |
3424 | + a @code{bzip2} data file). This is really a special case of @code{BZ_DATA_ERROR}. | |
3425 | + @code{BZ_MEM_ERROR} | |
3426 | + if insufficient memory was available | |
3427 | + @code{BZ_STREAM_END} | |
3428 | + if the logical end of stream was detected. | |
3429 | + @code{BZ_OK} | |
3430 | + otherwise. | |
3431 | +@end display | |
3432 | + | |
3433 | +Possible return values: | |
3434 | +@display | |
3435 | + number of bytes read | |
3436 | + if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END} | |
3437 | + undefined | |
3438 | + otherwise | |
3439 | +@end display | |
3440 | + | |
3441 | +Allowable next actions: | |
3442 | +@display | |
3443 | + collect data from @code{buf}, then @code{BZ2_bzRead} or @code{BZ2_bzReadClose} | |
3444 | + if @code{bzerror} is @code{BZ_OK} | |
3445 | + collect data from @code{buf}, then @code{BZ2_bzReadClose} or @code{BZ2_bzReadGetUnused} | |
3446 | + if @code{bzerror} is @code{BZ_SEQUENCE_END} | |
3447 | + @code{BZ2_bzReadClose} | |
3448 | + otherwise | |
3449 | +@end display | |
3450 | + | |
3451 | + | |
3452 | + | |
3453 | +@subsection @code{BZ2_bzReadGetUnused} | |
3454 | +@example | |
3455 | + void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b, | |
3456 | + void** unused, int* nUnused ); | |
3457 | +@end example | |
3458 | +Returns data which was read from the compressed file but was not needed | |
3459 | +to get to the logical end-of-stream. @code{*unused} is set to the address | |
3460 | +of the data, and @code{*nUnused} to the number of bytes. @code{*nUnused} will | |
3461 | +be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive. | |
3462 | + | |
3463 | +This function may only be called once @code{BZ2_bzRead} has signalled | |
3464 | +@code{BZ_STREAM_END} but before @code{BZ2_bzReadClose}. | |
3465 | + | |
3466 | +Possible assignments to @code{bzerror}: | |
3467 | +@display | |
3468 | + @code{BZ_PARAM_ERROR} | |
3469 | + if @code{b} is @code{NULL} | |
3470 | + or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL} | |
3471 | + @code{BZ_SEQUENCE_ERROR} | |
3472 | + if @code{BZ_STREAM_END} has not been signalled | |
3473 | + or if @code{b} was opened with @code{BZ2_bzWriteOpen} | |
3474 | + @code{BZ_OK} | |
3475 | + otherwise | |
3476 | +@end display | |
3477 | + | |
3478 | +Allowable next actions: | |
3479 | +@display | |
3480 | + @code{BZ2_bzReadClose} | |
3481 | +@end display | |
3482 | + | |
3483 | + | |
3484 | +@subsection @code{BZ2_bzReadClose} | |
3485 | +@example | |
3486 | + void BZ2_bzReadClose ( int *bzerror, BZFILE *b ); | |
3487 | +@end example | |
3488 | +Releases all memory pertaining to the compressed file @code{b}. | |
3489 | +@code{BZ2_bzReadClose} does not call @code{fclose} on the underlying file | |
3490 | +handle, so you should do that yourself if appropriate. | |
3491 | +@code{BZ2_bzReadClose} should be called to clean up after all error | |
3492 | +situations. | |
3493 | + | |
3494 | +Possible assignments to @code{bzerror}: | |
3495 | +@display | |
3496 | + @code{BZ_SEQUENCE_ERROR} | |
3497 | + if @code{b} was opened with @code{BZ2_bzOpenWrite} | |
3498 | + @code{BZ_OK} | |
3499 | + otherwise | |
3500 | +@end display | |
3501 | + | |
3502 | +Allowable next actions: | |
3503 | +@display | |
3504 | + none | |
3505 | +@end display | |
3506 | + | |
3507 | + | |
3508 | + | |
3509 | +@subsection @code{BZ2_bzWriteOpen} | |
3510 | +@example | |
3511 | + BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f, | |
3512 | + int blockSize100k, int verbosity, | |
3513 | + int workFactor ); | |
3514 | +@end example | |
3515 | +Prepare to write compressed data to file handle @code{f}. | |
3516 | +@code{f} should refer to | |
3517 | +a file which has been opened for writing, and for which the error | |
3518 | +indicator (@code{ferror(f)})is not set. | |
3519 | + | |
3520 | +For the meaning of parameters @code{blockSize100k}, | |
3521 | +@code{verbosity} and @code{workFactor}, see | |
3522 | +@* @code{BZ2_bzCompressInit}. | |
3523 | + | |
3524 | +All required memory is allocated at this stage, so if the call | |
3525 | +completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a | |
3526 | +subsequent call to @code{BZ2_bzWrite}. | |
3527 | + | |
3528 | +Possible assignments to @code{bzerror}: | |
3529 | +@display | |
3530 | + @code{BZ_CONFIG_ERROR} | |
3531 | + if the library has been mis-compiled | |
3532 | + @code{BZ_PARAM_ERROR} | |
3533 | + if @code{f} is @code{NULL} | |
3534 | + or @code{blockSize100k < 1} or @code{blockSize100k > 9} | |
3535 | + @code{BZ_IO_ERROR} | |
3536 | + if @code{ferror(f)} is nonzero | |
3537 | + @code{BZ_MEM_ERROR} | |
3538 | + if insufficient memory is available | |
3539 | + @code{BZ_OK} | |
3540 | + otherwise | |
3541 | +@end display | |
3542 | + | |
3543 | +Possible return values: | |
3544 | +@display | |
3545 | + Pointer to an abstract @code{BZFILE} | |
3546 | + if @code{bzerror} is @code{BZ_OK} | |
3547 | + @code{NULL} | |
3548 | + otherwise | |
3549 | +@end display | |
3550 | + | |
3551 | +Allowable next actions: | |
3552 | +@display | |
3553 | + @code{BZ2_bzWrite} | |
3554 | + if @code{bzerror} is @code{BZ_OK} | |
3555 | + (you could go directly to @code{BZ2_bzWriteClose}, but this would be pretty pointless) | |
3556 | + @code{BZ2_bzWriteClose} | |
3557 | + otherwise | |
3558 | +@end display | |
3559 | + | |
3560 | + | |
3561 | + | |
3562 | +@subsection @code{BZ2_bzWrite} | |
3563 | +@example | |
3564 | + void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len ); | |
3565 | +@end example | |
3566 | +Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be | |
3567 | +compressed and written to the file. | |
3568 | + | |
3569 | +Possible assignments to @code{bzerror}: | |
3570 | +@display | |
3571 | + @code{BZ_PARAM_ERROR} | |
3572 | + if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0} | |
3573 | + @code{BZ_SEQUENCE_ERROR} | |
3574 | + if b was opened with @code{BZ2_bzReadOpen} | |
3575 | + @code{BZ_IO_ERROR} | |
3576 | + if there is an error writing the compressed file. | |
3577 | + @code{BZ_OK} | |
3578 | + otherwise | |
3579 | +@end display | |
3580 | + | |
3581 | + | |
3582 | + | |
3583 | + | |
3584 | +@subsection @code{BZ2_bzWriteClose} | |
3585 | +@example | |
3586 | + void BZ2_bzWriteClose ( int *bzerror, BZFILE* f, | |
3587 | + int abandon, | |
3588 | + unsigned int* nbytes_in, | |
3589 | + unsigned int* nbytes_out ); | |
3590 | + | |
3591 | + void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f, | |
3592 | + int abandon, | |
3593 | + unsigned int* nbytes_in_lo32, | |
3594 | + unsigned int* nbytes_in_hi32, | |
3595 | + unsigned int* nbytes_out_lo32, | |
3596 | + unsigned int* nbytes_out_hi32 ); | |
3597 | +@end example | |
3598 | + | |
3599 | +Compresses and flushes to the compressed file all data so far supplied | |
3600 | +by @code{BZ2_bzWrite}. The logical end-of-stream markers are also written, so | |
3601 | +subsequent calls to @code{BZ2_bzWrite} are illegal. All memory associated | |
3602 | +with the compressed file @code{b} is released. | |
3603 | +@code{fflush} is called on the | |
3604 | +compressed file, but it is not @code{fclose}'d. | |
3605 | + | |
3606 | +If @code{BZ2_bzWriteClose} is called to clean up after an error, the only | |
3607 | +action is to release the memory. The library records the error codes | |
3608 | +issued by previous calls, so this situation will be detected | |
3609 | +automatically. There is no attempt to complete the compression | |
3610 | +operation, nor to @code{fflush} the compressed file. You can force this | |
3611 | +behaviour to happen even in the case of no error, by passing a nonzero | |
3612 | +value to @code{abandon}. | |
3613 | + | |
3614 | +If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the | |
3615 | +total volume of uncompressed data handled. Similarly, @code{nbytes_out} | |
3616 | +will be set to the total volume of compressed data written. For | |
3617 | +compatibility with older versions of the library, @code{BZ2_bzWriteClose} | |
3618 | +only yields the lower 32 bits of these counts. Use | |
3619 | +@code{BZ2_bzWriteClose64} if you want the full 64 bit counts. These | |
3620 | +two functions are otherwise absolutely identical. | |
3621 | + | |
3622 | + | |
3623 | +Possible assignments to @code{bzerror}: | |
3624 | +@display | |
3625 | + @code{BZ_SEQUENCE_ERROR} | |
3626 | + if @code{b} was opened with @code{BZ2_bzReadOpen} | |
3627 | + @code{BZ_IO_ERROR} | |
3628 | + if there is an error writing the compressed file | |
3629 | + @code{BZ_OK} | |
3630 | + otherwise | |
3631 | +@end display | |
3632 | + | |
3633 | +@subsection Handling embedded compressed data streams | |
3634 | + | |
3635 | +The high-level library facilitates use of | |
3636 | +@code{bzip2} data streams which form some part of a surrounding, larger | |
3637 | +data stream. | |
3638 | +@itemize @bullet | |
3639 | +@item For writing, the library takes an open file handle, writes | |
3640 | +compressed data to it, @code{fflush}es it but does not @code{fclose} it. | |
3641 | +The calling application can write its own data before and after the | |
3642 | +compressed data stream, using that same file handle. | |
3643 | +@item Reading is more complex, and the facilities are not as general | |
3644 | +as they could be since generality is hard to reconcile with efficiency. | |
3645 | +@code{BZ2_bzRead} reads from the compressed file in blocks of size | |
3646 | +@code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot | |
3647 | +the logical end of compressed stream. | |
3648 | +To recover this data once decompression has | |
3649 | +ended, call @code{BZ2_bzReadGetUnused} after the last call of @code{BZ2_bzRead} | |
3650 | +(the one returning @code{BZ_STREAM_END}) but before calling | |
3651 | +@code{BZ2_bzReadClose}. | |
3652 | +@end itemize | |
3653 | + | |
3654 | +This mechanism makes it easy to decompress multiple @code{bzip2} | |
3655 | +streams placed end-to-end. As the end of one stream, when @code{BZ2_bzRead} | |
3656 | +returns @code{BZ_STREAM_END}, call @code{BZ2_bzReadGetUnused} to collect the | |
3657 | +unused data (copy it into your own buffer somewhere). | |
3658 | +That data forms the start of the next compressed stream. | |
3659 | +To start uncompressing that next stream, call @code{BZ2_bzReadOpen} again, | |
3660 | +feeding in the unused data via the @code{unused}/@code{nUnused} | |
3661 | +parameters. | |
3662 | +Keep doing this until @code{BZ_STREAM_END} return coincides with the | |
3663 | +physical end of file (@code{feof(f)}). In this situation | |
3664 | +@code{BZ2_bzReadGetUnused} | |
3665 | +will of course return no data. | |
3666 | + | |
3667 | +This should give some feel for how the high-level interface can be used. | |
3668 | +If you require extra flexibility, you'll have to bite the bullet and get | |
3669 | +to grips with the low-level interface. | |
3670 | + | |
3671 | +@subsection Standard file-reading/writing code | |
3672 | +Here's how you'd write data to a compressed file: | |
3673 | +@example @code | |
3674 | +FILE* f; | |
3675 | +BZFILE* b; | |
3676 | +int nBuf; | |
3677 | +char buf[ /* whatever size you like */ ]; | |
3678 | +int bzerror; | |
3679 | +int nWritten; | |
3680 | + | |
3681 | +f = fopen ( "myfile.bz2", "w" ); | |
3682 | +if (!f) @{ | |
3683 | + /* handle error */ | |
3684 | +@} | |
3685 | +b = BZ2_bzWriteOpen ( &bzerror, f, 9 ); | |
3686 | +if (bzerror != BZ_OK) @{ | |
3687 | + BZ2_bzWriteClose ( b ); | |
3688 | + /* handle error */ | |
3689 | +@} | |
3690 | + | |
3691 | +while ( /* condition */ ) @{ | |
3692 | + /* get data to write into buf, and set nBuf appropriately */ | |
3693 | + nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf ); | |
3694 | + if (bzerror == BZ_IO_ERROR) @{ | |
3695 | + BZ2_bzWriteClose ( &bzerror, b ); | |
3696 | + /* handle error */ | |
3697 | + @} | |
3698 | +@} | |
3699 | + | |
3700 | +BZ2_bzWriteClose ( &bzerror, b ); | |
3701 | +if (bzerror == BZ_IO_ERROR) @{ | |
3702 | + /* handle error */ | |
3703 | +@} | |
3704 | +@end example | |
3705 | +And to read from a compressed file: | |
3706 | +@example | |
3707 | +FILE* f; | |
3708 | +BZFILE* b; | |
3709 | +int nBuf; | |
3710 | +char buf[ /* whatever size you like */ ]; | |
3711 | +int bzerror; | |
3712 | +int nWritten; | |
3713 | + | |
3714 | +f = fopen ( "myfile.bz2", "r" ); | |
3715 | +if (!f) @{ | |
3716 | + /* handle error */ | |
3717 | +@} | |
3718 | +b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 ); | |
3719 | +if (bzerror != BZ_OK) @{ | |
3720 | + BZ2_bzReadClose ( &bzerror, b ); | |
3721 | + /* handle error */ | |
3722 | +@} | |
3723 | + | |
3724 | +bzerror = BZ_OK; | |
3725 | +while (bzerror == BZ_OK && /* arbitrary other conditions */) @{ | |
3726 | + nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ ); | |
3727 | + if (bzerror == BZ_OK) @{ | |
3728 | + /* do something with buf[0 .. nBuf-1] */ | |
3729 | + @} | |
3730 | +@} | |
3731 | +if (bzerror != BZ_STREAM_END) @{ | |
3732 | + BZ2_bzReadClose ( &bzerror, b ); | |
3733 | + /* handle error */ | |
3734 | +@} else @{ | |
3735 | + BZ2_bzReadClose ( &bzerror ); | |
3736 | +@} | |
3737 | +@end example | |
3738 | + | |
3739 | + | |
3740 | + | |
3741 | +@section Utility functions | |
3742 | +@subsection @code{BZ2_bzBuffToBuffCompress} | |
3743 | +@example | |
3744 | + int BZ2_bzBuffToBuffCompress( char* dest, | |
3745 | + unsigned int* destLen, | |
3746 | + char* source, | |
3747 | + unsigned int sourceLen, | |
3748 | + int blockSize100k, | |
3749 | + int verbosity, | |
3750 | + int workFactor ); | |
3751 | +@end example | |
3752 | +Attempts to compress the data in @code{source[0 .. sourceLen-1]} | |
3753 | +into the destination buffer, @code{dest[0 .. *destLen-1]}. | |
3754 | +If the destination buffer is big enough, @code{*destLen} is | |
3755 | +set to the size of the compressed data, and @code{BZ_OK} is | |
3756 | +returned. If the compressed data won't fit, @code{*destLen} | |
3757 | +is unchanged, and @code{BZ_OUTBUFF_FULL} is returned. | |
3758 | + | |
3759 | +Compression in this manner is a one-shot event, done with a single call | |
3760 | +to this function. The resulting compressed data is a complete | |
3761 | +@code{bzip2} format data stream. There is no mechanism for making | |
3762 | +additional calls to provide extra input data. If you want that kind of | |
3763 | +mechanism, use the low-level interface. | |
3764 | + | |
3765 | +For the meaning of parameters @code{blockSize100k}, @code{verbosity} | |
3766 | +and @code{workFactor}, @* see @code{BZ2_bzCompressInit}. | |
3767 | + | |
3768 | +To guarantee that the compressed data will fit in its buffer, allocate | |
3769 | +an output buffer of size 1% larger than the uncompressed data, plus | |
3770 | +six hundred extra bytes. | |
3771 | + | |
3772 | +@code{BZ2_bzBuffToBuffDecompress} will not write data at or | |
3773 | +beyond @code{dest[*destLen]}, even in case of buffer overflow. | |
3774 | + | |
3775 | +Possible return values: | |
3776 | +@display | |
3777 | + @code{BZ_CONFIG_ERROR} | |
3778 | + if the library has been mis-compiled | |
3779 | + @code{BZ_PARAM_ERROR} | |
3780 | + if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL} | |
3781 | + or @code{blockSize100k < 1} or @code{blockSize100k > 9} | |
3782 | + or @code{verbosity < 0} or @code{verbosity > 4} | |
3783 | + or @code{workFactor < 0} or @code{workFactor > 250} | |
3784 | + @code{BZ_MEM_ERROR} | |
3785 | + if insufficient memory is available | |
3786 | + @code{BZ_OUTBUFF_FULL} | |
3787 | + if the size of the compressed data exceeds @code{*destLen} | |
3788 | + @code{BZ_OK} | |
3789 | + otherwise | |
3790 | +@end display | |
3791 | + | |
3792 | + | |
3793 | + | |
3794 | +@subsection @code{BZ2_bzBuffToBuffDecompress} | |
3795 | +@example | |
3796 | + int BZ2_bzBuffToBuffDecompress ( char* dest, | |
3797 | + unsigned int* destLen, | |
3798 | + char* source, | |
3799 | + unsigned int sourceLen, | |
3800 | + int small, | |
3801 | + int verbosity ); | |
3802 | +@end example | |
3803 | +Attempts to decompress the data in @code{source[0 .. sourceLen-1]} | |
3804 | +into the destination buffer, @code{dest[0 .. *destLen-1]}. | |
3805 | +If the destination buffer is big enough, @code{*destLen} is | |
3806 | +set to the size of the uncompressed data, and @code{BZ_OK} is | |
3807 | +returned. If the compressed data won't fit, @code{*destLen} | |
3808 | +is unchanged, and @code{BZ_OUTBUFF_FULL} is returned. | |
3809 | + | |
3810 | +@code{source} is assumed to hold a complete @code{bzip2} format | |
3811 | +data stream. @* @code{BZ2_bzBuffToBuffDecompress} tries to decompress | |
3812 | +the entirety of the stream into the output buffer. | |
3813 | + | |
3814 | +For the meaning of parameters @code{small} and @code{verbosity}, | |
3815 | +see @code{BZ2_bzDecompressInit}. | |
3816 | + | |
3817 | +Because the compression ratio of the compressed data cannot be known in | |
3818 | +advance, there is no easy way to guarantee that the output buffer will | |
3819 | +be big enough. You may of course make arrangements in your code to | |
3820 | +record the size of the uncompressed data, but such a mechanism is beyond | |
3821 | +the scope of this library. | |
3822 | + | |
3823 | +@code{BZ2_bzBuffToBuffDecompress} will not write data at or | |
3824 | +beyond @code{dest[*destLen]}, even in case of buffer overflow. | |
3825 | + | |
3826 | +Possible return values: | |
3827 | +@display | |
3828 | + @code{BZ_CONFIG_ERROR} | |
3829 | + if the library has been mis-compiled | |
3830 | + @code{BZ_PARAM_ERROR} | |
3831 | + if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL} | |
3832 | + or @code{small != 0 && small != 1} | |
3833 | + or @code{verbosity < 0} or @code{verbosity > 4} | |
3834 | + @code{BZ_MEM_ERROR} | |
3835 | + if insufficient memory is available | |
3836 | + @code{BZ_OUTBUFF_FULL} | |
3837 | + if the size of the compressed data exceeds @code{*destLen} | |
3838 | + @code{BZ_DATA_ERROR} | |
3839 | + if a data integrity error was detected in the compressed data | |
3840 | + @code{BZ_DATA_ERROR_MAGIC} | |
3841 | + if the compressed data doesn't begin with the right magic bytes | |
3842 | + @code{BZ_UNEXPECTED_EOF} | |
3843 | + if the compressed data ends unexpectedly | |
3844 | + @code{BZ_OK} | |
3845 | + otherwise | |
3846 | +@end display | |
3847 | + | |
3848 | + | |
3849 | + | |
3850 | +@section @code{zlib} compatibility functions | |
3851 | +Yoshioka Tsuneo has contributed some functions to | |
3852 | +give better @code{zlib} compatibility. These functions are | |
3853 | +@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush}, | |
3854 | +@code{BZ2_bzclose}, | |
3855 | +@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}. | |
3856 | +These functions are not (yet) officially part of | |
3857 | +the library. If they break, you get to keep all the pieces. | |
3858 | +Nevertheless, I think they work ok. | |
3859 | +@example | |
3860 | +typedef void BZFILE; | |
3861 | + | |
3862 | +const char * BZ2_bzlibVersion ( void ); | |
3863 | +@end example | |
3864 | +Returns a string indicating the library version. | |
3865 | +@example | |
3866 | +BZFILE * BZ2_bzopen ( const char *path, const char *mode ); | |
3867 | +BZFILE * BZ2_bzdopen ( int fd, const char *mode ); | |
3868 | +@end example | |
3869 | +Opens a @code{.bz2} file for reading or writing, using either its name | |
3870 | +or a pre-existing file descriptor. | |
3871 | +Analogous to @code{fopen} and @code{fdopen}. | |
3872 | +@example | |
3873 | +int BZ2_bzread ( BZFILE* b, void* buf, int len ); | |
3874 | +int BZ2_bzwrite ( BZFILE* b, void* buf, int len ); | |
3875 | +@end example | |
3876 | +Reads/writes data from/to a previously opened @code{BZFILE}. | |
3877 | +Analogous to @code{fread} and @code{fwrite}. | |
3878 | +@example | |
3879 | +int BZ2_bzflush ( BZFILE* b ); | |
3880 | +void BZ2_bzclose ( BZFILE* b ); | |
3881 | +@end example | |
3882 | +Flushes/closes a @code{BZFILE}. @code{BZ2_bzflush} doesn't actually do | |
3883 | +anything. Analogous to @code{fflush} and @code{fclose}. | |
3884 | + | |
3885 | +@example | |
3886 | +const char * BZ2_bzerror ( BZFILE *b, int *errnum ) | |
3887 | +@end example | |
3888 | +Returns a string describing the more recent error status of | |
3889 | +@code{b}, and also sets @code{*errnum} to its numerical value. | |
3890 | + | |
3891 | + | |
3892 | +@section Using the library in a @code{stdio}-free environment | |
3893 | + | |
3894 | +@subsection Getting rid of @code{stdio} | |
3895 | + | |
3896 | +In a deeply embedded application, you might want to use just | |
3897 | +the memory-to-memory functions. You can do this conveniently | |
3898 | +by compiling the library with preprocessor symbol @code{BZ_NO_STDIO} | |
3899 | +defined. Doing this gives you a library containing only the following | |
3900 | +eight functions: | |
3901 | + | |
3902 | +@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, @code{BZ2_bzCompressEnd} @* | |
3903 | +@code{BZ2_bzDecompressInit}, @code{BZ2_bzDecompress}, @code{BZ2_bzDecompressEnd} @* | |
3904 | +@code{BZ2_bzBuffToBuffCompress}, @code{BZ2_bzBuffToBuffDecompress} | |
3905 | + | |
3906 | +When compiled like this, all functions will ignore @code{verbosity} | |
3907 | +settings. | |
3908 | + | |
3909 | +@subsection Critical error handling | |
3910 | +@code{libbzip2} contains a number of internal assertion checks which | |
3911 | +should, needless to say, never be activated. Nevertheless, if an | |
3912 | +assertion should fail, behaviour depends on whether or not the library | |
3913 | +was compiled with @code{BZ_NO_STDIO} set. | |
3914 | + | |
3915 | +For a normal compile, an assertion failure yields the message | |
3916 | +@example | |
3917 | + bzip2/libbzip2: internal error number N. | |
3918 | + This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000. | |
3919 | + Please report it to me at: jseward@@acm.org. If this happened | |
3920 | + when you were using some program which uses libbzip2 as a | |
3921 | + component, you should also report this bug to the author(s) | |
3922 | + of that program. Please make an effort to report this bug; | |
3923 | + timely and accurate bug reports eventually lead to higher | |
3924 | + quality software. Thanks. Julian Seward, 21 March 2000. | |
3925 | +@end example | |
3926 | +where @code{N} is some error code number. @code{exit(3)} | |
3927 | +is then called. | |
3928 | + | |
3929 | +For a @code{stdio}-free library, assertion failures result | |
3930 | +in a call to a function declared as: | |
3931 | +@example | |
3932 | + extern void bz_internal_error ( int errcode ); | |
3933 | +@end example | |
3934 | +The relevant code is passed as a parameter. You should supply | |
3935 | +such a function. | |
3936 | + | |
3937 | +In either case, once an assertion failure has occurred, any | |
3938 | +@code{bz_stream} records involved can be regarded as invalid. | |
3939 | +You should not attempt to resume normal operation with them. | |
3940 | + | |
3941 | +You may, of course, change critical error handling to suit | |
3942 | +your needs. As I said above, critical errors indicate bugs | |
3943 | +in the library and should not occur. All "normal" error | |
3944 | +situations are indicated via error return codes from functions, | |
3945 | +and can be recovered from. | |
3946 | + | |
3947 | + | |
3948 | +@section Making a Windows DLL | |
3949 | +Everything related to Windows has been contributed by Yoshioka Tsuneo | |
3950 | +@* (@code{QWF00133@@niftyserve.or.jp} / | |
3951 | +@code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to | |
3952 | +him (but perhaps Cc: me, @code{jseward@@acm.org}). | |
3953 | + | |
3954 | +My vague understanding of what to do is: using Visual C++ 5.0, | |
3955 | +open the project file @code{libbz2.dsp}, and build. That's all. | |
3956 | + | |
3957 | +If you can't | |
3958 | +open the project file for some reason, make a new one, naming these files: | |
3959 | +@code{blocksort.c}, @code{bzlib.c}, @code{compress.c}, | |
3960 | +@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @* | |
3961 | +@code{randtable.c} and @code{libbz2.def}. You will also need | |
3962 | +to name the header files @code{bzlib.h} and @code{bzlib_private.h}. | |
3963 | + | |
3964 | +If you don't use VC++, you may need to define the proprocessor symbol | |
3965 | +@code{_WIN32}. | |
3966 | + | |
3967 | +Finally, @code{dlltest.c} is a sample program using the DLL. It has a | |
3968 | +project file, @code{dlltest.dsp}. | |
3969 | + | |
3970 | +If you just want a makefile for Visual C, have a look at | |
3971 | +@code{makefile.msc}. | |
3972 | + | |
3973 | +Be aware that if you compile @code{bzip2} itself on Win32, you must set | |
3974 | +@code{BZ_UNIX} to 0 and @code{BZ_LCCWIN32} to 1, in the file | |
3975 | +@code{bzip2.c}, before compiling. Otherwise the resulting binary won't | |
3976 | +work correctly. | |
3977 | + | |
3978 | +I haven't tried any of this stuff myself, but it all looks plausible. | |
3979 | + | |
3980 | + | |
3981 | + | |
3982 | +@chapter Miscellanea | |
3983 | + | |
3984 | +These are just some random thoughts of mine. Your mileage may | |
3985 | +vary. | |
3986 | + | |
3987 | +@section Limitations of the compressed file format | |
3988 | +@code{bzip2-1.0}, @code{0.9.5} and @code{0.9.0} | |
3989 | +use exactly the same file format as the previous | |
3990 | +version, @code{bzip2-0.1}. This decision was made in the interests of | |
3991 | +stability. Creating yet another incompatible compressed file format | |
3992 | +would create further confusion and disruption for users. | |
3993 | + | |
3994 | +Nevertheless, this is not a painless decision. Development | |
3995 | +work since the release of @code{bzip2-0.1} in August 1997 | |
3996 | +has shown complexities in the file format which slow down | |
3997 | +decompression and, in retrospect, are unnecessary. These are: | |
3998 | +@itemize @bullet | |
3999 | +@item The run-length encoder, which is the first of the | |
4000 | + compression transformations, is entirely irrelevant. | |
4001 | + The original purpose was to protect the sorting algorithm | |
4002 | + from the very worst case input: a string of repeated | |
4003 | + symbols. But algorithm steps Q6a and Q6b in the original | |
4004 | + Burrows-Wheeler technical report (SRC-124) show how | |
4005 | + repeats can be handled without difficulty in block | |
4006 | + sorting. | |
4007 | +@item The randomisation mechanism doesn't really need to be | |
4008 | + there. Udi Manber and Gene Myers published a suffix | |
4009 | + array construction algorithm a few years back, which | |
4010 | + can be employed to sort any block, no matter how | |
4011 | + repetitive, in O(N log N) time. Subsequent work by | |
4012 | + Kunihiko Sadakane has produced a derivative O(N (log N)^2) | |
4013 | + algorithm which usually outperforms the Manber-Myers | |
4014 | + algorithm. | |
4015 | + | |
4016 | + I could have changed to Sadakane's algorithm, but I find | |
4017 | + it to be slower than @code{bzip2}'s existing algorithm for | |
4018 | + most inputs, and the randomisation mechanism protects | |
4019 | + adequately against bad cases. I didn't think it was | |
4020 | + a good tradeoff to make. Partly this is due to the fact | |
4021 | + that I was not flooded with email complaints about | |
4022 | + @code{bzip2-0.1}'s performance on repetitive data, so | |
4023 | + perhaps it isn't a problem for real inputs. | |
4024 | + | |
4025 | + Probably the best long-term solution, | |
4026 | + and the one I have incorporated into 0.9.5 and above, | |
4027 | + is to use the existing sorting | |
4028 | + algorithm initially, and fall back to a O(N (log N)^2) | |
4029 | + algorithm if the standard algorithm gets into difficulties. | |
4030 | +@item The compressed file format was never designed to be | |
4031 | + handled by a library, and I have had to jump though | |
4032 | + some hoops to produce an efficient implementation of | |
4033 | + decompression. It's a bit hairy. Try passing | |
4034 | + @code{decompress.c} through the C preprocessor | |
4035 | + and you'll see what I mean. Much of this complexity | |
4036 | + could have been avoided if the compressed size of | |
4037 | + each block of data was recorded in the data stream. | |
4038 | +@item An Adler-32 checksum, rather than a CRC32 checksum, | |
4039 | + would be faster to compute. | |
4040 | +@end itemize | |
4041 | +It would be fair to say that the @code{bzip2} format was frozen | |
4042 | +before I properly and fully understood the performance | |
4043 | +consequences of doing so. | |
4044 | + | |
4045 | +Improvements which I was able to incorporate into | |
4046 | +0.9.0, despite using the same file format, are: | |
4047 | +@itemize @bullet | |
4048 | +@item Single array implementation of the inverse BWT. This | |
4049 | + significantly speeds up decompression, presumably | |
4050 | + because it reduces the number of cache misses. | |
4051 | +@item Faster inverse MTF transform for large MTF values. The | |
4052 | + new implementation is based on the notion of sliding blocks | |
4053 | + of values. | |
4054 | +@item @code{bzip2-0.9.0} now reads and writes files with @code{fread} | |
4055 | + and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}. | |
4056 | + Duh! Well, you live and learn. | |
4057 | + | |
4058 | +@end itemize | |
4059 | +Further ahead, it would be nice | |
4060 | +to be able to do random access into files. This will | |
4061 | +require some careful design of compressed file formats. | |
4062 | + | |
4063 | + | |
4064 | + | |
4065 | +@section Portability issues | |
4066 | +After some consideration, I have decided not to use | |
4067 | +GNU @code{autoconf} to configure 0.9.5 or 1.0. | |
4068 | + | |
4069 | +@code{autoconf}, admirable and wonderful though it is, | |
4070 | +mainly assists with portability problems between Unix-like | |
4071 | +platforms. But @code{bzip2} doesn't have much in the way | |
4072 | +of portability problems on Unix; most of the difficulties appear | |
4073 | +when porting to the Mac, or to Microsoft's operating systems. | |
4074 | +@code{autoconf} doesn't help in those cases, and brings in a | |
4075 | +whole load of new complexity. | |
4076 | + | |
4077 | +Most people should be able to compile the library and program | |
4078 | +under Unix straight out-of-the-box, so to speak, especially | |
4079 | +if you have a version of GNU C available. | |
4080 | + | |
4081 | +There are a couple of @code{__inline__} directives in the code. GNU C | |
4082 | +(@code{gcc}) should be able to handle them. If you're not using | |
4083 | +GNU C, your C compiler shouldn't see them at all. | |
4084 | +If your compiler does, for some reason, see them and doesn't | |
4085 | +like them, just @code{#define} @code{__inline__} to be @code{/* */}. One | |
4086 | +easy way to do this is to compile with the flag @code{-D__inline__=}, | |
4087 | +which should be understood by most Unix compilers. | |
4088 | + | |
4089 | +If you still have difficulties, try compiling with the macro | |
4090 | +@code{BZ_STRICT_ANSI} defined. This should enable you to build the | |
4091 | +library in a strictly ANSI compliant environment. Building the program | |
4092 | +itself like this is dangerous and not supported, since you remove | |
4093 | +@code{bzip2}'s checks against compressing directories, symbolic links, | |
4094 | +devices, and other not-really-a-file entities. This could cause | |
4095 | +filesystem corruption! | |
4096 | + | |
4097 | +One other thing: if you create a @code{bzip2} binary for public | |
4098 | +distribution, please try and link it statically (@code{gcc -s}). This | |
4099 | +avoids all sorts of library-version issues that others may encounter | |
4100 | +later on. | |
4101 | + | |
4102 | +If you build @code{bzip2} on Win32, you must set @code{BZ_UNIX} to 0 and | |
4103 | +@code{BZ_LCCWIN32} to 1, in the file @code{bzip2.c}, before compiling. | |
4104 | +Otherwise the resulting binary won't work correctly. | |
4105 | + | |
4106 | + | |
4107 | + | |
4108 | +@section Reporting bugs | |
4109 | +I tried pretty hard to make sure @code{bzip2} is | |
4110 | +bug free, both by design and by testing. Hopefully | |
4111 | +you'll never need to read this section for real. | |
4112 | + | |
4113 | +Nevertheless, if @code{bzip2} dies with a segmentation | |
4114 | +fault, a bus error or an internal assertion failure, it | |
4115 | +will ask you to email me a bug report. Experience with | |
4116 | +version 0.1 shows that almost all these problems can | |
4117 | +be traced to either compiler bugs or hardware problems. | |
4118 | +@itemize @bullet | |
4119 | +@item | |
4120 | +Recompile the program with no optimisation, and see if it | |
4121 | +works. And/or try a different compiler. | |
4122 | +I heard all sorts of stories about various flavours | |
4123 | +of GNU C (and other compilers) generating bad code for | |
4124 | +@code{bzip2}, and I've run across two such examples myself. | |
4125 | + | |
4126 | +2.7.X versions of GNU C are known to generate bad code from | |
4127 | +time to time, at high optimisation levels. | |
4128 | +If you get problems, try using the flags | |
4129 | +@code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}. | |
4130 | +You should specifically @emph{not} use @code{-funroll-loops}. | |
4131 | + | |
4132 | +You may notice that the Makefile runs six tests as part of | |
4133 | +the build process. If the program passes all of these, it's | |
4134 | +a pretty good (but not 100%) indication that the compiler has | |
4135 | +done its job correctly. | |
4136 | +@item | |
4137 | +If @code{bzip2} crashes randomly, and the crashes are not | |
4138 | +repeatable, you may have a flaky memory subsystem. @code{bzip2} | |
4139 | +really hammers your memory hierarchy, and if it's a bit marginal, | |
4140 | +you may get these problems. Ditto if your disk or I/O subsystem | |
4141 | +is slowly failing. Yup, this really does happen. | |
4142 | + | |
4143 | +Try using a different machine of the same type, and see if | |
4144 | +you can repeat the problem. | |
4145 | +@item This isn't really a bug, but ... If @code{bzip2} tells | |
4146 | +you your file is corrupted on decompression, and you | |
4147 | +obtained the file via FTP, there is a possibility that you | |
4148 | +forgot to tell FTP to do a binary mode transfer. That absolutely | |
4149 | +will cause the file to be non-decompressible. You'll have to transfer | |
4150 | +it again. | |
4151 | +@end itemize | |
4152 | + | |
4153 | +If you've incorporated @code{libbzip2} into your own program | |
4154 | +and are getting problems, please, please, please, check that the | |
4155 | +parameters you are passing in calls to the library, are | |
4156 | +correct, and in accordance with what the documentation says | |
4157 | +is allowable. I have tried to make the library robust against | |
4158 | +such problems, but I'm sure I haven't succeeded. | |
4159 | + | |
4160 | +Finally, if the above comments don't help, you'll have to send | |
4161 | +me a bug report. Now, it's just amazing how many people will | |
4162 | +send me a bug report saying something like | |
4163 | +@display | |
4164 | + bzip2 crashed with segmentation fault on my machine | |
4165 | +@end display | |
4166 | +and absolutely nothing else. Needless to say, a such a report | |
4167 | +is @emph{totally, utterly, completely and comprehensively 100% useless; | |
4168 | +a waste of your time, my time, and net bandwidth}. | |
4169 | +With no details at all, there's no way I can possibly begin | |
4170 | +to figure out what the problem is. | |
4171 | + | |
4172 | +The rules of the game are: facts, facts, facts. Don't omit | |
4173 | +them because "oh, they won't be relevant". At the bare | |
4174 | +minimum: | |
4175 | +@display | |
4176 | + Machine type. Operating system version. | |
4177 | + Exact version of @code{bzip2} (do @code{bzip2 -V}). | |
4178 | + Exact version of the compiler used. | |
4179 | + Flags passed to the compiler. | |
4180 | +@end display | |
4181 | +However, the most important single thing that will help me is | |
4182 | +the file that you were trying to compress or decompress at the | |
4183 | +time the problem happened. Without that, my ability to do anything | |
4184 | +more than speculate about the cause, is limited. | |
4185 | + | |
4186 | +Please remember that I connect to the Internet with a modem, so | |
4187 | +you should contact me before mailing me huge files. | |
4188 | + | |
4189 | + | |
4190 | +@section Did you get the right package? | |
4191 | + | |
4192 | +@code{bzip2} is a resource hog. It soaks up large amounts of CPU cycles | |
4193 | +and memory. Also, it gives very large latencies. In the worst case, you | |
4194 | +can feed many megabytes of uncompressed data into the library before | |
4195 | +getting any compressed output, so this probably rules out applications | |
4196 | +requiring interactive behaviour. | |
4197 | + | |
4198 | +These aren't faults of my implementation, I hope, but more | |
4199 | +an intrinsic property of the Burrows-Wheeler transform (unfortunately). | |
4200 | +Maybe this isn't what you want. | |
4201 | + | |
4202 | +If you want a compressor and/or library which is faster, uses less | |
4203 | +memory but gets pretty good compression, and has minimal latency, | |
4204 | +consider Jean-loup | |
4205 | +Gailly's and Mark Adler's work, @code{zlib-1.1.2} and | |
4206 | +@code{gzip-1.2.4}. Look for them at | |
4207 | + | |
4208 | +@code{http://www.cdrom.com/pub/infozip/zlib} and | |
4209 | +@code{http://www.gzip.org} respectively. | |
4210 | + | |
4211 | +For something faster and lighter still, you might try Markus F X J | |
4212 | +Oberhumer's @code{LZO} real-time compression/decompression library, at | |
4213 | +@* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}. | |
4214 | + | |
4215 | +If you want to use the @code{bzip2} algorithms to compress small blocks | |
4216 | +of data, 64k bytes or smaller, for example on an on-the-fly disk | |
4217 | +compressor, you'd be well advised not to use this library. Instead, | |
4218 | +I've made a special library tuned for that kind of use. It's part of | |
4219 | +@code{e2compr-0.40}, an on-the-fly disk compressor for the Linux | |
4220 | +@code{ext2} filesystem. Look at | |
4221 | +@code{http://www.netspace.net.au/~reiter/e2compr}. | |
4222 | + | |
4223 | + | |
4224 | + | |
4225 | +@section Testing | |
4226 | + | |
4227 | +A record of the tests I've done. | |
4228 | + | |
4229 | +First, some data sets: | |
4230 | +@itemize @bullet | |
4231 | +@item B: a directory containing 6001 files, one for every length in the | |
4232 | + range 0 to 6000 bytes. The files contain random lowercase | |
4233 | + letters. 18.7 megabytes. | |
4234 | +@item H: my home directory tree. Documents, source code, mail files, | |
4235 | + compressed data. H contains B, and also a directory of | |
4236 | + files designed as boundary cases for the sorting; mostly very | |
4237 | + repetitive, nasty files. 565 megabytes. | |
4238 | +@item A: directory tree holding various applications built from source: | |
4239 | + @code{egcs}, @code{gcc-2.8.1}, KDE, GTK, Octave, etc. | |
4240 | + 2200 megabytes. | |
4241 | +@end itemize | |
4242 | +The tests conducted are as follows. Each test means compressing | |
4243 | +(a copy of) each file in the data set, decompressing it and | |
4244 | +comparing it against the original. | |
4245 | + | |
4246 | +First, a bunch of tests with block sizes and internal buffer | |
4247 | +sizes set very small, | |
4248 | +to detect any problems with the | |
4249 | +blocking and buffering mechanisms. | |
4250 | +This required modifying the source code so as to try to | |
4251 | +break it. | |
4252 | +@enumerate | |
4253 | +@item Data set H, with | |
4254 | + buffer size of 1 byte, and block size of 23 bytes. | |
4255 | +@item Data set B, buffer sizes 1 byte, block size 1 byte. | |
4256 | +@item As (2) but small-mode decompression. | |
4257 | +@item As (2) with block size 2 bytes. | |
4258 | +@item As (2) with block size 3 bytes. | |
4259 | +@item As (2) with block size 4 bytes. | |
4260 | +@item As (2) with block size 5 bytes. | |
4261 | +@item As (2) with block size 6 bytes and small-mode decompression. | |
4262 | +@item H with buffer size of 1 byte, but normal block | |
4263 | + size (up to 900000 bytes). | |
4264 | +@end enumerate | |
4265 | +Then some tests with unmodified source code. | |
4266 | +@enumerate | |
4267 | +@item H, all settings normal. | |
4268 | +@item As (1), with small-mode decompress. | |
4269 | +@item H, compress with flag @code{-1}. | |
4270 | +@item H, compress with flag @code{-s}, decompress with flag @code{-s}. | |
4271 | +@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing, | |
4272 | + @code{bzip2-0.9.5} decompressing, all settings normal. | |
4273 | +@item Backwards compatibility: H, @code{bzip2-0.9.5} compressing, | |
4274 | + @code{bzip2-0.1pl2} decompressing, all settings normal. | |
4275 | +@item Bigger tests: A, all settings normal. | |
4276 | +@item As (7), using the fallback (Sadakane-like) sorting algorithm. | |
4277 | +@item As (8), compress with flag @code{-1}, decompress with flag | |
4278 | + @code{-s}. | |
4279 | +@item H, using the fallback sorting algorithm. | |
4280 | +@item Forwards compatibility: A, @code{bzip2-0.1pl2} compressing, | |
4281 | + @code{bzip2-0.9.5} decompressing, all settings normal. | |
4282 | +@item Backwards compatibility: A, @code{bzip2-0.9.5} compressing, | |
4283 | + @code{bzip2-0.1pl2} decompressing, all settings normal. | |
4284 | +@item Misc test: about 400 megabytes of @code{.tar} files with | |
4285 | + @code{bzip2} compiled with Checker (a memory access error | |
4286 | + detector, like Purify). | |
4287 | +@item Misc tests to make sure it builds and runs ok on non-Linux/x86 | |
4288 | + platforms. | |
4289 | +@end enumerate | |
4290 | +These tests were conducted on a 225 MHz IDT WinChip machine, running | |
4291 | +Linux 2.0.36. They represent nearly a week of continuous computation. | |
4292 | +All tests completed successfully. | |
4293 | + | |
4294 | + | |
4295 | +@section Further reading | |
4296 | +@code{bzip2} is not research work, in the sense that it doesn't present | |
4297 | +any new ideas. Rather, it's an engineering exercise based on existing | |
4298 | +ideas. | |
4299 | + | |
4300 | +Four documents describe essentially all the ideas behind @code{bzip2}: | |
4301 | +@example | |
4302 | +Michael Burrows and D. J. Wheeler: | |
4303 | + "A block-sorting lossless data compression algorithm" | |
4304 | + 10th May 1994. | |
4305 | + Digital SRC Research Report 124. | |
4306 | + ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz | |
4307 | + If you have trouble finding it, try searching at the | |
4308 | + New Zealand Digital Library, http://www.nzdl.org. | |
4309 | + | |
4310 | +Daniel S. Hirschberg and Debra A. LeLewer | |
4311 | + "Efficient Decoding of Prefix Codes" | |
4312 | + Communications of the ACM, April 1990, Vol 33, Number 4. | |
4313 | + You might be able to get an electronic copy of this | |
4314 | + from the ACM Digital Library. | |
4315 | + | |
4316 | +David J. Wheeler | |
4317 | + Program bred3.c and accompanying document bred3.ps. | |
4318 | + This contains the idea behind the multi-table Huffman | |
4319 | + coding scheme. | |
4320 | + ftp://ftp.cl.cam.ac.uk/users/djw3/ | |
4321 | + | |
4322 | +Jon L. Bentley and Robert Sedgewick | |
4323 | + "Fast Algorithms for Sorting and Searching Strings" | |
4324 | + Available from Sedgewick's web page, | |
4325 | + www.cs.princeton.edu/~rs | |
4326 | +@end example | |
4327 | +The following paper gives valuable additional insights into the | |
4328 | +algorithm, but is not immediately the basis of any code | |
4329 | +used in bzip2. | |
4330 | +@example | |
4331 | +Peter Fenwick: | |
4332 | + Block Sorting Text Compression | |
4333 | + Proceedings of the 19th Australasian Computer Science Conference, | |
4334 | + Melbourne, Australia. Jan 31 - Feb 2, 1996. | |
4335 | + ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps | |
4336 | +@end example | |
4337 | +Kunihiko Sadakane's sorting algorithm, mentioned above, | |
4338 | +is available from: | |
4339 | +@example | |
4340 | +http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz | |
4341 | +@end example | |
4342 | +The Manber-Myers suffix array construction | |
4343 | +algorithm is described in a paper | |
4344 | +available from: | |
4345 | +@example | |
4346 | +http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps | |
4347 | +@end example | |
4348 | +Finally, the following paper documents some recent investigations | |
4349 | +I made into the performance of sorting algorithms: | |
4350 | +@example | |
4351 | +Julian Seward: | |
4352 | + On the Performance of BWT Sorting Algorithms | |
4353 | + Proceedings of the IEEE Data Compression Conference 2000 | |
4354 | + Snowbird, Utah. 28-30 March 2000. | |
4355 | +@end example | |
4356 | + | |
4357 | + | |
4358 | +@contents | |
4359 | + | |
4360 | +@bye | |
4361 | + | |
4362 | diff -Nru bzip2-1.0.1/doc/bzip2recover.1 bzip2-1.0.1.new/doc/bzip2recover.1 | |
4363 | --- bzip2-1.0.1/doc/bzip2recover.1 Thu Jan 1 01:00:00 1970 | |
4364 | +++ bzip2-1.0.1.new/doc/bzip2recover.1 Sat Jun 24 20:13:06 2000 | |
4365 | @@ -0,0 +1 @@ | |
4366 | +.so bzip2.1 | |
4367 | \ No newline at end of file | |
4368 | diff -Nru bzip2-1.0.1/doc/pl/Makefile.am bzip2-1.0.1.new/doc/pl/Makefile.am | |
4369 | --- bzip2-1.0.1/doc/pl/Makefile.am Thu Jan 1 01:00:00 1970 | |
4370 | +++ bzip2-1.0.1.new/doc/pl/Makefile.am Sat Jun 24 20:13:06 2000 | |
4371 | @@ -0,0 +1,4 @@ | |
4372 | + | |
4373 | +mandir = @mandir@/pl | |
4374 | +man_MANS = bzip2.1 bunzip2.1 bzcat.1 bzip2recover.1 | |
4375 | + | |
4376 | diff -Nru bzip2-1.0.1/doc/pl/bunzip2.1 bzip2-1.0.1.new/doc/pl/bunzip2.1 | |
4377 | --- bzip2-1.0.1/doc/pl/bunzip2.1 Thu Jan 1 01:00:00 1970 | |
4378 | +++ bzip2-1.0.1.new/doc/pl/bunzip2.1 Sat Jun 24 20:13:06 2000 | |
4379 | @@ -0,0 +1 @@ | |
4380 | +.so bzip2.1 | |
4381 | \ No newline at end of file | |
4382 | diff -Nru bzip2-1.0.1/doc/pl/bzcat.1 bzip2-1.0.1.new/doc/pl/bzcat.1 | |
4383 | --- bzip2-1.0.1/doc/pl/bzcat.1 Thu Jan 1 01:00:00 1970 | |
4384 | +++ bzip2-1.0.1.new/doc/pl/bzcat.1 Sat Jun 24 20:13:06 2000 | |
4385 | @@ -0,0 +1 @@ | |
4386 | +.so bzip2.1 | |
4387 | \ No newline at end of file | |
4388 | diff -Nru bzip2-1.0.1/doc/pl/bzip2.1 bzip2-1.0.1.new/doc/pl/bzip2.1 | |
4389 | --- bzip2-1.0.1/doc/pl/bzip2.1 Thu Jan 1 01:00:00 1970 | |
4390 | +++ bzip2-1.0.1.new/doc/pl/bzip2.1 Sat Jun 24 20:13:06 2000 | |
4391 | @@ -0,0 +1,384 @@ | |
4392 |