]> git.pld-linux.org Git - packages/bzip2.git/blame - bzip2-libtoolizeautoconf.patch
- based on rawhide libtoolizeautoconf patch prepare much beret
[packages/bzip2.git] / bzip2-libtoolizeautoconf.patch
CommitLineData
d967e3ec 1diff -Nru bzip2-1.0.1/AUTHORS bzip2-1.0.1.new/AUTHORS
2--- bzip2-1.0.1/AUTHORS Thu Jan 1 01:00:00 1970
3+++ bzip2-1.0.1.new/AUTHORS Sat Jun 24 20:13:05 2000
4@@ -0,0 +1 @@
5+Julian Seward <jseward@acm.org>
6diff -Nru bzip2-1.0.1/CHANGES bzip2-1.0.1.new/CHANGES
7--- bzip2-1.0.1/CHANGES Sat Jun 24 20:13:27 2000
8+++ bzip2-1.0.1.new/CHANGES Thu Jan 1 01:00:00 1970
9@@ -1,167 +0,0 @@
10-
11-
12-0.9.0
13-~~~~~
14-First version.
15-
16-
17-0.9.0a
18-~~~~~~
19-Removed 'ranlib' from Makefile, since most modern Unix-es
20-don't need it, or even know about it.
21-
22-
23-0.9.0b
24-~~~~~~
25-Fixed a problem with error reporting in bzip2.c. This does not effect
26-the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the
27-program proper) compress and decompress correctly, but give misleading
28-error messages (internal panics) when an I/O error occurs, instead of
29-reporting the problem correctly. This shouldn't give any data loss
30-(as far as I can see), but is confusing.
31-
32-Made the inline declarations disappear for non-GCC compilers.
33-
34-
35-0.9.0c
36-~~~~~~
37-Fixed some problems in the library pertaining to some boundary cases.
38-This makes the library behave more correctly in those situations. The
39-fixes apply only to features (calls and parameters) not used by
40-bzip2.c, so the non-fixedness of them in previous versions has no
41-effect on reliability of bzip2.c.
42-
43-In bzlib.c:
44- * made zero-length BZ_FLUSH work correctly in bzCompress().
45- * fixed bzWrite/bzRead to ignore zero-length requests.
46- * fixed bzread to correctly handle read requests after EOF.
47- * wrong parameter order in call to bzDecompressInit in
48- bzBuffToBuffDecompress. Fixed.
49-
50-In compress.c:
51- * changed setting of nGroups in sendMTFValues() so as to
52- do a bit better on small files. This _does_ effect
53- bzip2.c.
54-
55-
56-0.9.5a
57-~~~~~~
58-Major change: add a fallback sorting algorithm (blocksort.c)
59-to give reasonable behaviour even for very repetitive inputs.
60-Nuked --repetitive-best and --repetitive-fast since they are
61-no longer useful.
62-
63-Minor changes: mostly a whole bunch of small changes/
64-bugfixes in the driver (bzip2.c). Changes pertaining to the
65-user interface are:
66-
67- allow decompression of symlink'd files to stdout
68- decompress/test files even without .bz2 extension
69- give more accurate error messages for I/O errors
70- when compressing/decompressing to stdout, don't catch control-C
71- read flags from BZIP2 and BZIP environment variables
72- decline to break hard links to a file unless forced with -f
73- allow -c flag even with no filenames
74- preserve file ownerships as far as possible
75- make -s -1 give the expected block size (100k)
76- add a flag -q --quiet to suppress nonessential warnings
77- stop decoding flags after --, so files beginning in - can be handled
78- resolved inconsistent naming: bzcat or bz2cat ?
79- bzip2 --help now returns 0
80-
81-Programming-level changes are:
82-
83- fixed syntax error in GET_LL4 for Borland C++ 5.02
84- let bzBuffToBuffDecompress return BZ_DATA_ERROR{_MAGIC}
85- fix overshoot of mode-string end in bzopen_or_bzdopen
86- wrapped bzlib.h in #ifdef __cplusplus ... extern "C" { ... }
87- close file handles under all error conditions
88- added minor mods so it compiles with DJGPP out of the box
89- fixed Makefile so it doesn't give problems with BSD make
90- fix uninitialised memory reads in dlltest.c
91-
92-0.9.5b
93-~~~~~~
94-Open stdin/stdout in binary mode for DJGPP.
95-
96-0.9.5c
97-~~~~~~
98-Changed BZ_N_OVERSHOOT to be ... + 2 instead of ... + 1. The + 1
99-version could cause the sorted order to be wrong in some extremely
100-obscure cases. Also changed setting of quadrant in blocksort.c.
101-
102-0.9.5d
103-~~~~~~
104-The only functional change is to make bzlibVersion() in the library
105-return the correct string. This has no effect whatsoever on the
106-functioning of the bzip2 program or library. Added a couple of casts
107-so the library compiles without warnings at level 3 in MS Visual
108-Studio 6.0. Included a Y2K statement in the file Y2K_INFO. All other
109-changes are minor documentation changes.
110-
111-1.0
112-~~~
113-Several minor bugfixes and enhancements:
114-
115-* Large file support. The library uses 64-bit counters to
116- count the volume of data passing through it. bzip2.c
117- is now compiled with -D_FILE_OFFSET_BITS=64 to get large
118- file support from the C library. -v correctly prints out
119- file sizes greater than 4 gigabytes. All these changes have
120- been made without assuming a 64-bit platform or a C compiler
121- which supports 64-bit ints, so, except for the C library
122- aspect, they are fully portable.
123-
124-* Decompression robustness. The library/program should be
125- robust to any corruption of compressed data, detecting and
126- handling _all_ corruption, instead of merely relying on
127- the CRCs. What this means is that the program should
128- never crash, given corrupted data, and the library should
129- always return BZ_DATA_ERROR.
130-
131-* Fixed an obscure race-condition bug only ever observed on
132- Solaris, in which, if you were very unlucky and issued
133- control-C at exactly the wrong time, both input and output
134- files would be deleted.
135-
136-* Don't run out of file handles on test/decompression when
137- large numbers of files have invalid magic numbers.
138-
139-* Avoid library namespace pollution. Prefix all exported
140- symbols with BZ2_.
141-
142-* Minor sorting enhancements from my DCC2000 paper.
143-
144-* Advance the version number to 1.0, so as to counteract the
145- (false-in-this-case) impression some people have that programs
146- with version numbers less than 1.0 are in someway, experimental,
147- pre-release versions.
148-
149-* Create an initial Makefile-libbz2_so to build a shared library.
150- Yes, I know I should really use libtool et al ...
151-
152-* Make the program exit with 2 instead of 0 when decompression
153- fails due to a bad magic number (ie, an invalid bzip2 header).
154- Also exit with 1 (as the manual claims :-) whenever a diagnostic
155- message would have been printed AND the corresponding operation
156- is aborted, for example
157- bzip2: Output file xx already exists.
158- When a diagnostic message is printed but the operation is not
159- aborted, for example
160- bzip2: Can't guess original name for wurble -- using wurble.out
161- then the exit value 0 is returned, unless some other problem is
162- also detected.
163-
164- I think it corresponds more closely to what the manual claims now.
165-
166-
167-1.0.1
168-~~~~~
169-* Modified dlltest.c so it uses the new BZ2_ naming scheme.
170-* Modified makefile-msc to fix minor build probs on Win2k.
171-* Updated README.COMPILATION.PROBLEMS.
172-
173-There are no functionality changes or bug fixes relative to version
174-1.0.0. This is just a documentation update + a fix for minor Win32
175-build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
176-utterly pointless. Don't bother.
177diff -Nru bzip2-1.0.1/COPYING bzip2-1.0.1.new/COPYING
178--- bzip2-1.0.1/COPYING Thu Jan 1 01:00:00 1970
179+++ bzip2-1.0.1.new/COPYING Sat Jun 24 20:13:05 2000
180@@ -0,0 +1,39 @@
181+
182+This program, "bzip2" and associated library "libbzip2", are
183+copyright (C) 1996-2000 Julian R Seward. All rights reserved.
184+
185+Redistribution and use in source and binary forms, with or without
186+modification, are permitted provided that the following conditions
187+are met:
188+
189+1. Redistributions of source code must retain the above copyright
190+ notice, this list of conditions and the following disclaimer.
191+
192+2. The origin of this software must not be misrepresented; you must
193+ not claim that you wrote the original software. If you use this
194+ software in a product, an acknowledgment in the product
195+ documentation would be appreciated but is not required.
196+
197+3. Altered source versions must be plainly marked as such, and must
198+ not be misrepresented as being the original software.
199+
200+4. The name of the author may not be used to endorse or promote
201+ products derived from this software without specific prior written
202+ permission.
203+
204+THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
205+OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
206+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
207+ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
208+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
209+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
210+GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
211+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
212+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
213+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
214+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
215+
216+Julian Seward, Cambridge, UK.
217+jseward@acm.org
218+bzip2/libbzip2 version 1.0 of 21 March 2000
219+
220diff -Nru bzip2-1.0.1/ChangeLog bzip2-1.0.1.new/ChangeLog
221--- bzip2-1.0.1/ChangeLog Thu Jan 1 01:00:00 1970
222+++ bzip2-1.0.1.new/ChangeLog Sat Jun 24 20:13:05 2000
223@@ -0,0 +1 @@
224+
225diff -Nru bzip2-1.0.1/INSTALL bzip2-1.0.1.new/INSTALL
226--- bzip2-1.0.1/INSTALL Thu Jan 1 01:00:00 1970
227+++ bzip2-1.0.1.new/INSTALL Sat Jun 24 20:13:06 2000
228@@ -0,0 +1,182 @@
229+Basic Installation
230+==================
231+
232+ These are generic installation instructions.
233+
234+ The `configure' shell script attempts to guess correct values for
235+various system-dependent variables used during compilation. It uses
236+those values to create a `Makefile' in each directory of the package.
237+It may also create one or more `.h' files containing system-dependent
238+definitions. Finally, it creates a shell script `config.status' that
239+you can run in the future to recreate the current configuration, a file
240+`config.cache' that saves the results of its tests to speed up
241+reconfiguring, and a file `config.log' containing compiler output
242+(useful mainly for debugging `configure').
243+
244+ If you need to do unusual things to compile the package, please try
245+to figure out how `configure' could check whether to do them, and mail
246+diffs or instructions to the address given in the `README' so they can
247+be considered for the next release. If at some point `config.cache'
248+contains results you don't want to keep, you may remove or edit it.
249+
250+ The file `configure.in' is used to create `configure' by a program
251+called `autoconf'. You only need `configure.in' if you want to change
252+it or regenerate `configure' using a newer version of `autoconf'.
253+
254+The simplest way to compile this package is:
255+
256+ 1. `cd' to the directory containing the package's source code and type
257+ `./configure' to configure the package for your system. If you're
258+ using `csh' on an old version of System V, you might need to type
259+ `sh ./configure' instead to prevent `csh' from trying to execute
260+ `configure' itself.
261+
262+ Running `configure' takes awhile. While running, it prints some
263+ messages telling which features it is checking for.
264+
265+ 2. Type `make' to compile the package.
266+
267+ 3. Optionally, type `make check' to run any self-tests that come with
268+ the package.
269+
270+ 4. Type `make install' to install the programs and any data files and
271+ documentation.
272+
273+ 5. You can remove the program binaries and object files from the
274+ source code directory by typing `make clean'. To also remove the
275+ files that `configure' created (so you can compile the package for
276+ a different kind of computer), type `make distclean'. There is
277+ also a `make maintainer-clean' target, but that is intended mainly
278+ for the package's developers. If you use it, you may have to get
279+ all sorts of other programs in order to regenerate files that came
280+ with the distribution.
281+
282+Compilers and Options
283+=====================
284+
285+ Some systems require unusual options for compilation or linking that
286+the `configure' script does not know about. You can give `configure'
287+initial values for variables by setting them in the environment. Using
288+a Bourne-compatible shell, you can do that on the command line like
289+this:
290+ CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
291+
292+Or on systems that have the `env' program, you can do it like this:
293+ env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
294+
295+Compiling For Multiple Architectures
296+====================================
297+
298+ You can compile the package for more than one kind of computer at the
299+same time, by placing the object files for each architecture in their
300+own directory. To do this, you must use a version of `make' that
301+supports the `VPATH' variable, such as GNU `make'. `cd' to the
302+directory where you want the object files and executables to go and run
303+the `configure' script. `configure' automatically checks for the
304+source code in the directory that `configure' is in and in `..'.
305+
306+ If you have to use a `make' that does not supports the `VPATH'
307+variable, you have to compile the package for one architecture at a time
308+in the source code directory. After you have installed the package for
309+one architecture, use `make distclean' before reconfiguring for another
310+architecture.
311+
312+Installation Names
313+==================
314+
315+ By default, `make install' will install the package's files in
316+`/usr/local/bin', `/usr/local/man', etc. You can specify an
317+installation prefix other than `/usr/local' by giving `configure' the
318+option `--prefix=PATH'.
319+
320+ You can specify separate installation prefixes for
321+architecture-specific files and architecture-independent files. If you
322+give `configure' the option `--exec-prefix=PATH', the package will use
323+PATH as the prefix for installing programs and libraries.
324+Documentation and other data files will still use the regular prefix.
325+
326+ In addition, if you use an unusual directory layout you can give
327+options like `--bindir=PATH' to specify different values for particular
328+kinds of files. Run `configure --help' for a list of the directories
329+you can set and what kinds of files go in them.
330+
331+ If the package supports it, you can cause programs to be installed
332+with an extra prefix or suffix on their names by giving `configure' the
333+option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
334+
335+Optional Features
336+=================
337+
338+ Some packages pay attention to `--enable-FEATURE' options to
339+`configure', where FEATURE indicates an optional part of the package.
340+They may also pay attention to `--with-PACKAGE' options, where PACKAGE
341+is something like `gnu-as' or `x' (for the X Window System). The
342+`README' should mention any `--enable-' and `--with-' options that the
343+package recognizes.
344+
345+ For packages that use the X Window System, `configure' can usually
346+find the X include and library files automatically, but if it doesn't,
347+you can use the `configure' options `--x-includes=DIR' and
348+`--x-libraries=DIR' to specify their locations.
349+
350+Specifying the System Type
351+==========================
352+
353+ There may be some features `configure' can not figure out
354+automatically, but needs to determine by the type of host the package
355+will run on. Usually `configure' can figure that out, but if it prints
356+a message saying it can not guess the host type, give it the
357+`--host=TYPE' option. TYPE can either be a short name for the system
358+type, such as `sun4', or a canonical name with three fields:
359+ CPU-COMPANY-SYSTEM
360+
361+See the file `config.sub' for the possible values of each field. If
362+`config.sub' isn't included in this package, then this package doesn't
363+need to know the host type.
364+
365+ If you are building compiler tools for cross-compiling, you can also
366+use the `--target=TYPE' option to select the type of system they will
367+produce code for and the `--build=TYPE' option to select the type of
368+system on which you are compiling the package.
369+
370+Sharing Defaults
371+================
372+
373+ If you want to set default values for `configure' scripts to share,
374+you can create a site shell script called `config.site' that gives
375+default values for variables like `CC', `cache_file', and `prefix'.
376+`configure' looks for `PREFIX/share/config.site' if it exists, then
377+`PREFIX/etc/config.site' if it exists. Or, you can set the
378+`CONFIG_SITE' environment variable to the location of the site script.
379+A warning: not all `configure' scripts look for a site script.
380+
381+Operation Controls
382+==================
383+
384+ `configure' recognizes the following options to control how it
385+operates.
386+
387+`--cache-file=FILE'
388+ Use and save the results of the tests in FILE instead of
389+ `./config.cache'. Set FILE to `/dev/null' to disable caching, for
390+ debugging `configure'.
391+
392+`--help'
393+ Print a summary of the options to `configure', and exit.
394+
395+`--quiet'
396+`--silent'
397+`-q'
398+ Do not print messages saying which checks are being made. To
399+ suppress all normal output, redirect it to `/dev/null' (any error
400+ messages will still be shown).
401+
402+`--srcdir=DIR'
403+ Look for the package's source code in directory DIR. Usually
404+ `configure' can determine that directory automatically.
405+
406+`--version'
407+ Print the version of Autoconf used to generate the `configure'
408+ script, and exit.
409+
410+`configure' also accepts some other, not widely useful, options.
411diff -Nru bzip2-1.0.1/LICENSE bzip2-1.0.1.new/LICENSE
412--- bzip2-1.0.1/LICENSE Sat Jun 24 20:13:27 2000
413+++ bzip2-1.0.1.new/LICENSE Thu Jan 1 01:00:00 1970
414@@ -1,39 +0,0 @@
415-
416-This program, "bzip2" and associated library "libbzip2", are
417-copyright (C) 1996-2000 Julian R Seward. All rights reserved.
418-
419-Redistribution and use in source and binary forms, with or without
420-modification, are permitted provided that the following conditions
421-are met:
422-
423-1. Redistributions of source code must retain the above copyright
424- notice, this list of conditions and the following disclaimer.
425-
426-2. The origin of this software must not be misrepresented; you must
427- not claim that you wrote the original software. If you use this
428- software in a product, an acknowledgment in the product
429- documentation would be appreciated but is not required.
430-
431-3. Altered source versions must be plainly marked as such, and must
432- not be misrepresented as being the original software.
433-
434-4. The name of the author may not be used to endorse or promote
435- products derived from this software without specific prior written
436- permission.
437-
438-THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
439-OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
440-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
441-ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
442-DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
443-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
444-GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
445-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
446-WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
447-NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
448-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
449-
450-Julian Seward, Cambridge, UK.
451-jseward@acm.org
452-bzip2/libbzip2 version 1.0 of 21 March 2000
453-
454diff -Nru bzip2-1.0.1/Makefile-libbz2_so bzip2-1.0.1.new/Makefile-libbz2_so
455--- bzip2-1.0.1/Makefile-libbz2_so Sat Jun 24 20:13:27 2000
456+++ bzip2-1.0.1.new/Makefile-libbz2_so Thu Jan 1 01:00:00 1970
457@@ -1,43 +0,0 @@
458-
459-# This Makefile builds a shared version of the library,
460-# libbz2.so.1.0.1, with soname libbz2.so.1.0,
461-# at least on x86-Linux (RedHat 5.2),
462-# with gcc-2.7.2.3. Please see the README file for some
463-# important info about building the library like this.
464-
465-SHELL=/bin/sh
466-CC=gcc
467-BIGFILES=-D_FILE_OFFSET_BITS=64
468-CFLAGS=-fpic -fPIC -Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
469-
470-OBJS= blocksort.o \
471- huffman.o \
472- crctable.o \
473- randtable.o \
474- compress.o \
475- decompress.o \
476- bzlib.o
477-
478-all: $(OBJS)
479- $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
480- $(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1
481- rm -f libbz2.so.1.0
482- ln -s libbz2.so.1.0.1 libbz2.so.1.0
483-
484-clean:
485- rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared
486-
487-blocksort.o: blocksort.c
488- $(CC) $(CFLAGS) -c blocksort.c
489-huffman.o: huffman.c
490- $(CC) $(CFLAGS) -c huffman.c
491-crctable.o: crctable.c
492- $(CC) $(CFLAGS) -c crctable.c
493-randtable.o: randtable.c
494- $(CC) $(CFLAGS) -c randtable.c
495-compress.o: compress.c
496- $(CC) $(CFLAGS) -c compress.c
497-decompress.o: decompress.c
498- $(CC) $(CFLAGS) -c decompress.c
499-bzlib.o: bzlib.c
500- $(CC) $(CFLAGS) -c bzlib.c
501diff -Nru bzip2-1.0.1/Makefile.am bzip2-1.0.1.new/Makefile.am
502--- bzip2-1.0.1/Makefile.am Thu Jan 1 01:00:00 1970
503+++ bzip2-1.0.1.new/Makefile.am Sat Jun 24 20:17:47 2000
504@@ -0,0 +1,31 @@
505+SUBDIRS = doc
506+
507+bin_PROGRAMS = bzip2 bzip2recover
508+bzip2_SOURCES = bzip2.c
509+
510+bzip2_LDADD = libbz2.la
511+bzip2recover_SOURCES = bzip2recover.c
512+lib_LTLIBRARIES = libbz2.la
513+libbz2_la_SOURCES = \
514+ blocksort.c \
515+ huffman.c \
516+ crctable.c \
517+ randtable.c \
518+ compress.c \
519+ decompress.c \
520+ bzlib.c \
521+ bzlib.h \
522+ bzlib_private.h
523+
524+libbz2_la_LDFLAGS = -version-info 1:0:0
525+include_HEADERS = bzlib.h bzlib_private.h
526+
527+bzip2SCRIPTS = bzless
528+
529+EXTRA_DIST = README README.COMPILATION.PROBLEMS \
530+ Y2K_INFO libbz2.def libbz2.dsp \
531+ sample1.bz2 sample1.ref sample2.bz2 sample2.ref sample3.bz2 sample3.ref
532+
533+install-exec-hook:
534+ $(LN_S) -f bzip2 $(DESTDIR)$(bindir)/bunzip2
535+ $(LN_S) -f bzip2 $(DESTDIR)$(bindir)/bzcat
536diff -Nru bzip2-1.0.1/NEWS bzip2-1.0.1.new/NEWS
537--- bzip2-1.0.1/NEWS Thu Jan 1 01:00:00 1970
538+++ bzip2-1.0.1.new/NEWS Sat Jun 24 20:13:06 2000
539@@ -0,0 +1,12 @@
540+
541+
542+1.0.1
543+~~~~~
544+* Modified dlltest.c so it uses the new BZ2_ naming scheme.
545+* Modified makefile-msc to fix minor build probs on Win2k.
546+* Updated README.COMPILATION.PROBLEMS.
547+
548+There are no functionality changes or bug fixes relative to version
549+1.0.0. This is just a documentation update + a fix for minor Win32
550+build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
551+utterly pointless. Don't bother.
552diff -Nru bzip2-1.0.1/acinclude.m4 bzip2-1.0.1.new/acinclude.m4
553--- bzip2-1.0.1/acinclude.m4 Thu Jan 1 01:00:00 1970
554+++ bzip2-1.0.1.new/acinclude.m4 Sat Jun 24 20:13:06 2000
555@@ -0,0 +1,129 @@
556+#serial 7
557+
558+dnl By default, many hosts won't let programs access large files;
559+dnl one must use special compiler options to get large-file access to work.
560+dnl For more details about this brain damage please see:
561+dnl http://www.sas.com/standards/large.file/x_open.20Mar96.html
562+
563+dnl Written by Paul Eggert <eggert@twinsun.com>.
564+
565+dnl Internal subroutine of AC_SYS_LARGEFILE.
566+dnl AC_SYS_LARGEFILE_FLAGS(FLAGSNAME)
567+AC_DEFUN(AC_SYS_LARGEFILE_FLAGS,
568+ [AC_CACHE_CHECK([for $1 value to request large file support],
569+ ac_cv_sys_largefile_$1,
570+ [if ($GETCONF LFS_$1) >conftest.1 2>conftest.2 && test ! -s conftest.2
571+ then
572+ ac_cv_sys_largefile_$1=`cat conftest.1`
573+ else
574+ ac_cv_sys_largefile_$1=no
575+ ifelse($1, CFLAGS,
576+ [case "$host_os" in
577+ # HP-UX 10.20 requires -D__STDC_EXT__ with gcc 2.95.1.
578+changequote(, )dnl
579+ hpux10.[2-9][0-9]* | hpux1[1-9]* | hpux[2-9][0-9]*)
580+changequote([, ])dnl
581+ if test "$GCC" = yes; then
582+ ac_cv_sys_largefile_CFLAGS=-D__STDC_EXT__
583+ fi
584+ ;;
585+ # IRIX 6.2 and later require cc -n32.
586+changequote(, )dnl
587+ irix6.[2-9]* | irix6.1[0-9]* | irix[7-9].* | irix[1-9][0-9]*)
588+changequote([, ])dnl
589+ if test "$GCC" != yes; then
590+ ac_cv_sys_largefile_CFLAGS=-n32
591+ fi
592+ esac
593+ if test "$ac_cv_sys_largefile_CFLAGS" != no; then
594+ ac_save_CC="$CC"
595+ CC="$CC $ac_cv_sys_largefile_CFLAGS"
596+ AC_TRY_LINK(, , , ac_cv_sys_largefile_CFLAGS=no)
597+ CC="$ac_save_CC"
598+ fi])
599+ fi
600+ rm -f conftest*])])
601+
602+dnl Internal subroutine of AC_SYS_LARGEFILE.
603+dnl AC_SYS_LARGEFILE_SPACE_APPEND(VAR, VAL)
604+AC_DEFUN(AC_SYS_LARGEFILE_SPACE_APPEND,
605+ [case $2 in
606+ no) ;;
607+ ?*)
608+ case "[$]$1" in
609+ '') $1=$2 ;;
610+ *) $1=[$]$1' '$2 ;;
611+ esac ;;
612+ esac])
613+
614+dnl Internal subroutine of AC_SYS_LARGEFILE.
615+dnl AC_SYS_LARGEFILE_MACRO_VALUE(C-MACRO, CACHE-VAR, COMMENT, CODE-TO-SET-DEFAULT)
616+AC_DEFUN(AC_SYS_LARGEFILE_MACRO_VALUE,
617+ [AC_CACHE_CHECK([for $1], $2,
618+ [$2=no
619+changequote(, )dnl
620+ $4
621+ for ac_flag in $ac_cv_sys_largefile_CFLAGS no; do
622+ case "$ac_flag" in
623+ -D$1)
624+ $2=1 ;;
625+ -D$1=*)
626+ $2=`expr " $ac_flag" : '[^=]*=\(.*\)'` ;;
627+ esac
628+ done
629+changequote([, ])dnl
630+ ])
631+ if test "[$]$2" != no; then
632+ AC_DEFINE_UNQUOTED([$1], [$]$2, [$3])
633+ fi])
634+
635+AC_DEFUN(AC_SYS_LARGEFILE,
636+ [AC_REQUIRE([AC_CANONICAL_HOST])
637+ AC_ARG_ENABLE(largefile,
638+ [ --disable-largefile omit support for large files])
639+ if test "$enable_largefile" != no; then
640+ AC_CHECK_TOOL(GETCONF, getconf)
641+ AC_SYS_LARGEFILE_FLAGS(CFLAGS)
642+ AC_SYS_LARGEFILE_FLAGS(LDFLAGS)
643+ AC_SYS_LARGEFILE_FLAGS(LIBS)
644+
645+ for ac_flag in $ac_cv_sys_largefile_CFLAGS no; do
646+ case "$ac_flag" in
647+ no) ;;
648+ -D_FILE_OFFSET_BITS=*) ;;
649+ -D_LARGEFILE_SOURCE | -D_LARGEFILE_SOURCE=*) ;;
650+ -D_LARGE_FILES | -D_LARGE_FILES=*) ;;
651+ -D?* | -I?*)
652+ AC_SYS_LARGEFILE_SPACE_APPEND(CPPFLAGS, "$ac_flag") ;;
653+ *)
654+ AC_SYS_LARGEFILE_SPACE_APPEND(CFLAGS, "$ac_flag") ;;
655+ esac
656+ done
657+ AC_SYS_LARGEFILE_SPACE_APPEND(LDFLAGS, "$ac_cv_sys_largefile_LDFLAGS")
658+ AC_SYS_LARGEFILE_SPACE_APPEND(LIBS, "$ac_cv_sys_largefile_LIBS")
659+ AC_SYS_LARGEFILE_MACRO_VALUE(_FILE_OFFSET_BITS,
660+ ac_cv_sys_file_offset_bits,
661+ [Number of bits in a file offset, on hosts where this is settable.],
662+ [case "$host_os" in
663+ # HP-UX 10.20 and later
664+ hpux10.[2-9][0-9]* | hpux1[1-9]* | hpux[2-9][0-9]*)
665+ ac_cv_sys_file_offset_bits=64 ;;
666+ esac])
667+ AC_SYS_LARGEFILE_MACRO_VALUE(_LARGEFILE_SOURCE,
668+ ac_cv_sys_largefile_source,
669+ [Define to make fseeko etc. visible, on some hosts.],
670+ [case "$host_os" in
671+ # HP-UX 10.20 and later
672+ hpux10.[2-9][0-9]* | hpux1[1-9]* | hpux[2-9][0-9]*)
673+ ac_cv_sys_largefile_source=1 ;;
674+ esac])
675+ AC_SYS_LARGEFILE_MACRO_VALUE(_LARGE_FILES,
676+ ac_cv_sys_large_files,
677+ [Define for large files, on AIX-style hosts.],
678+ [case "$host_os" in
679+ # AIX 4.2 and later
680+ aix4.[2-9]* | aix4.1[0-9]* | aix[5-9].* | aix[1-9][0-9]*)
681+ ac_cv_sys_large_files=1 ;;
682+ esac])
683+ fi
684+ ])
685diff -Nru bzip2-1.0.1/bzip2.1 bzip2-1.0.1.new/bzip2.1
686--- bzip2-1.0.1/bzip2.1 Sat Jun 24 20:13:27 2000
687+++ bzip2-1.0.1.new/bzip2.1 Thu Jan 1 01:00:00 1970
688@@ -1,439 +0,0 @@
689-.PU
690-.TH bzip2 1
691-.SH NAME
692-bzip2, bunzip2 \- a block-sorting file compressor, v1.0
693-.br
694-bzcat \- decompresses files to stdout
695-.br
696-bzip2recover \- recovers data from damaged bzip2 files
697-
698-.SH SYNOPSIS
699-.ll +8
700-.B bzip2
701-.RB [ " \-cdfkqstvzVL123456789 " ]
702-[
703-.I "filenames \&..."
704-]
705-.ll -8
706-.br
707-.B bunzip2
708-.RB [ " \-fkvsVL " ]
709-[
710-.I "filenames \&..."
711-]
712-.br
713-.B bzcat
714-.RB [ " \-s " ]
715-[
716-.I "filenames \&..."
717-]
718-.br
719-.B bzip2recover
720-.I "filename"
721-
722-.SH DESCRIPTION
723-.I bzip2
724-compresses files using the Burrows-Wheeler block sorting
725-text compression algorithm, and Huffman coding. Compression is
726-generally considerably better than that achieved by more conventional
727-LZ77/LZ78-based compressors, and approaches the performance of the PPM
728-family of statistical compressors.
729-
730-The command-line options are deliberately very similar to
731-those of
732-.I GNU gzip,
733-but they are not identical.
734-
735-.I bzip2
736-expects a list of file names to accompany the
737-command-line flags. Each file is replaced by a compressed version of
738-itself, with the name "original_name.bz2".
739-Each compressed file
740-has the same modification date, permissions, and, when possible,
741-ownership as the corresponding original, so that these properties can
742-be correctly restored at decompression time. File name handling is
743-naive in the sense that there is no mechanism for preserving original
744-file names, permissions, ownerships or dates in filesystems which lack
745-these concepts, or have serious file name length restrictions, such as
746-MS-DOS.
747-
748-.I bzip2
749-and
750-.I bunzip2
751-will by default not overwrite existing
752-files. If you want this to happen, specify the \-f flag.
753-
754-If no file names are specified,
755-.I bzip2
756-compresses from standard
757-input to standard output. In this case,
758-.I bzip2
759-will decline to
760-write compressed output to a terminal, as this would be entirely
761-incomprehensible and therefore pointless.
762-
763-.I bunzip2
764-(or
765-.I bzip2 \-d)
766-decompresses all
767-specified files. Files which were not created by
768-.I bzip2
769-will be detected and ignored, and a warning issued.
770-.I bzip2
771-attempts to guess the filename for the decompressed file
772-from that of the compressed file as follows:
773-
774- filename.bz2 becomes filename
775- filename.bz becomes filename
776- filename.tbz2 becomes filename.tar
777- filename.tbz becomes filename.tar
778- anyothername becomes anyothername.out
779-
780-If the file does not end in one of the recognised endings,
781-.I .bz2,
782-.I .bz,
783-.I .tbz2
784-or
785-.I .tbz,
786-.I bzip2
787-complains that it cannot
788-guess the name of the original file, and uses the original name
789-with
790-.I .out
791-appended.
792-
793-As with compression, supplying no
794-filenames causes decompression from
795-standard input to standard output.
796-
797-.I bunzip2
798-will correctly decompress a file which is the
799-concatenation of two or more compressed files. The result is the
800-concatenation of the corresponding uncompressed files. Integrity
801-testing (\-t)
802-of concatenated
803-compressed files is also supported.
804-
805-You can also compress or decompress files to the standard output by
806-giving the \-c flag. Multiple files may be compressed and
807-decompressed like this. The resulting outputs are fed sequentially to
808-stdout. Compression of multiple files
809-in this manner generates a stream
810-containing multiple compressed file representations. Such a stream
811-can be decompressed correctly only by
812-.I bzip2
813-version 0.9.0 or
814-later. Earlier versions of
815-.I bzip2
816-will stop after decompressing
817-the first file in the stream.
818-
819-.I bzcat
820-(or
821-.I bzip2 -dc)
822-decompresses all specified files to
823-the standard output.
824-
825-.I bzip2
826-will read arguments from the environment variables
827-.I BZIP2
828-and
829-.I BZIP,
830-in that order, and will process them
831-before any arguments read from the command line. This gives a
832-convenient way to supply default arguments.
833-
834-Compression is always performed, even if the compressed
835-file is slightly
836-larger than the original. Files of less than about one hundred bytes
837-tend to get larger, since the compression mechanism has a constant
838-overhead in the region of 50 bytes. Random data (including the output
839-of most file compressors) is coded at about 8.05 bits per byte, giving
840-an expansion of around 0.5%.
841-
842-As a self-check for your protection,
843-.I
844-bzip2
845-uses 32-bit CRCs to
846-make sure that the decompressed version of a file is identical to the
847-original. This guards against corruption of the compressed data, and
848-against undetected bugs in
849-.I bzip2
850-(hopefully very unlikely). The
851-chances of data corruption going undetected is microscopic, about one
852-chance in four billion for each file processed. Be aware, though, that
853-the check occurs upon decompression, so it can only tell you that
854-something is wrong. It can't help you
855-recover the original uncompressed
856-data. You can use
857-.I bzip2recover
858-to try to recover data from
859-damaged files.
860-
861-Return values: 0 for a normal exit, 1 for environmental problems (file
862-not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
863-compressed file, 3 for an internal consistency error (eg, bug) which
864-caused
865-.I bzip2
866-to panic.
867-
868-.SH OPTIONS
869-.TP
870-.B \-c --stdout
871-Compress or decompress to standard output.
872-.TP
873-.B \-d --decompress
874-Force decompression.
875-.I bzip2,
876-.I bunzip2
877-and
878-.I bzcat
879-are
880-really the same program, and the decision about what actions to take is
881-done on the basis of which name is used. This flag overrides that
882-mechanism, and forces
883-.I bzip2
884-to decompress.
885-.TP
886-.B \-z --compress
887-The complement to \-d: forces compression, regardless of the
888-invokation name.
889-.TP
890-.B \-t --test
891-Check integrity of the specified file(s), but don't decompress them.
892-This really performs a trial decompression and throws away the result.
893-.TP
894-.B \-f --force
895-Force overwrite of output files. Normally,
896-.I bzip2
897-will not overwrite
898-existing output files. Also forces
899-.I bzip2
900-to break hard links
901-to files, which it otherwise wouldn't do.
902-.TP
903-.B \-k --keep
904-Keep (don't delete) input files during compression
905-or decompression.
906-.TP
907-.B \-s --small
908-Reduce memory usage, for compression, decompression and testing. Files
909-are decompressed and tested using a modified algorithm which only
910-requires 2.5 bytes per block byte. This means any file can be
911-decompressed in 2300k of memory, albeit at about half the normal speed.
912-
913-During compression, \-s selects a block size of 200k, which limits
914-memory use to around the same figure, at the expense of your compression
915-ratio. In short, if your machine is low on memory (8 megabytes or
916-less), use \-s for everything. See MEMORY MANAGEMENT below.
917-.TP
918-.B \-q --quiet
919-Suppress non-essential warning messages. Messages pertaining to
920-I/O errors and other critical events will not be suppressed.
921-.TP
922-.B \-v --verbose
923-Verbose mode -- show the compression ratio for each file processed.
924-Further \-v's increase the verbosity level, spewing out lots of
925-information which is primarily of interest for diagnostic purposes.
926-.TP
927-.B \-L --license -V --version
928-Display the software version, license terms and conditions.
929-.TP
930-.B \-1 to \-9
931-Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
932-effect when decompressing. See MEMORY MANAGEMENT below.
933-.TP
934-.B \--
935-Treats all subsequent arguments as file names, even if they start
936-with a dash. This is so you can handle files with names beginning
937-with a dash, for example: bzip2 \-- \-myfilename.
938-.TP
939-.B \--repetitive-fast --repetitive-best
940-These flags are redundant in versions 0.9.5 and above. They provided
941-some coarse control over the behaviour of the sorting algorithm in
942-earlier versions, which was sometimes useful. 0.9.5 and above have an
943-improved algorithm which renders these flags irrelevant.
944-
945-.SH MEMORY MANAGEMENT
946-.I bzip2
947-compresses large files in blocks. The block size affects
948-both the compression ratio achieved, and the amount of memory needed for
949-compression and decompression. The flags \-1 through \-9
950-specify the block size to be 100,000 bytes through 900,000 bytes (the
951-default) respectively. At decompression time, the block size used for
952-compression is read from the header of the compressed file, and
953-.I bunzip2
954-then allocates itself just enough memory to decompress
955-the file. Since block sizes are stored in compressed files, it follows
956-that the flags \-1 to \-9 are irrelevant to and so ignored
957-during decompression.
958-
959-Compression and decompression requirements,
960-in bytes, can be estimated as:
961-
962- Compression: 400k + ( 8 x block size )
963-
964- Decompression: 100k + ( 4 x block size ), or
965- 100k + ( 2.5 x block size )
966-
967-Larger block sizes give rapidly diminishing marginal returns. Most of
968-the compression comes from the first two or three hundred k of block
969-size, a fact worth bearing in mind when using
970-.I bzip2
971-on small machines.
972-It is also important to appreciate that the decompression memory
973-requirement is set at compression time by the choice of block size.
974-
975-For files compressed with the default 900k block size,
976-.I bunzip2
977-will require about 3700 kbytes to decompress. To support decompression
978-of any file on a 4 megabyte machine,
979-.I bunzip2
980-has an option to
981-decompress using approximately half this amount of memory, about 2300
982-kbytes. Decompression speed is also halved, so you should use this
983-option only where necessary. The relevant flag is -s.
984-
985-In general, try and use the largest block size memory constraints allow,
986-since that maximises the compression achieved. Compression and
987-decompression speed are virtually unaffected by block size.
988-
989-Another significant point applies to files which fit in a single block
990--- that means most files you'd encounter using a large block size. The
991-amount of real memory touched is proportional to the size of the file,
992-since the file is smaller than a block. For example, compressing a file
993-20,000 bytes long with the flag -9 will cause the compressor to
994-allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
995-kbytes of it. Similarly, the decompressor will allocate 3700k but only
996-touch 100k + 20000 * 4 = 180 kbytes.
997-
998-Here is a table which summarises the maximum memory usage for different
999-block sizes. Also recorded is the total compressed size for 14 files of
1000-the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
1001-column gives some feel for how compression varies with block size.
1002-These figures tend to understate the advantage of larger block sizes for
1003-larger files, since the Corpus is dominated by smaller files.
1004-
1005- Compress Decompress Decompress Corpus
1006- Flag usage usage -s usage Size
1007-
1008- -1 1200k 500k 350k 914704
1009- -2 2000k 900k 600k 877703
1010- -3 2800k 1300k 850k 860338
1011- -4 3600k 1700k 1100k 846899
1012- -5 4400k 2100k 1350k 845160
1013- -6 5200k 2500k 1600k 838626
1014- -7 6100k 2900k 1850k 834096
1015- -8 6800k 3300k 2100k 828642
1016- -9 7600k 3700k 2350k 828642
1017-
1018-.SH RECOVERING DATA FROM DAMAGED FILES
1019-.I bzip2
1020-compresses files in blocks, usually 900kbytes long. Each
1021-block is handled independently. If a media or transmission error causes
1022-a multi-block .bz2
1023-file to become damaged, it may be possible to
1024-recover data from the undamaged blocks in the file.
1025-
1026-The compressed representation of each block is delimited by a 48-bit
1027-pattern, which makes it possible to find the block boundaries with
1028-reasonable certainty. Each block also carries its own 32-bit CRC, so
1029-damaged blocks can be distinguished from undamaged ones.
1030-
1031-.I bzip2recover
1032-is a simple program whose purpose is to search for
1033-blocks in .bz2 files, and write each block out into its own .bz2
1034-file. You can then use
1035-.I bzip2
1036-\-t
1037-to test the
1038-integrity of the resulting files, and decompress those which are
1039-undamaged.
1040-
1041-.I bzip2recover
1042-takes a single argument, the name of the damaged file,
1043-and writes a number of files "rec0001file.bz2",
1044-"rec0002file.bz2", etc, containing the extracted blocks.
1045-The output filenames are designed so that the use of
1046-wildcards in subsequent processing -- for example,
1047-"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
1048-the correct order.
1049-
1050-.I bzip2recover
1051-should be of most use dealing with large .bz2
1052-files, as these will contain many blocks. It is clearly
1053-futile to use it on damaged single-block files, since a
1054-damaged block cannot be recovered. If you wish to minimise
1055-any potential data loss through media or transmission errors,
1056-you might consider compressing with a smaller
1057-block size.
1058-
1059-.SH PERFORMANCE NOTES
1060-The sorting phase of compression gathers together similar strings in the
1061-file. Because of this, files containing very long runs of repeated
1062-symbols, like "aabaabaabaab ..." (repeated several hundred times) may
1063-compress more slowly than normal. Versions 0.9.5 and above fare much
1064-better than previous versions in this respect. The ratio between
1065-worst-case and average-case compression time is in the region of 10:1.
1066-For previous versions, this figure was more like 100:1. You can use the
1067-\-vvvv option to monitor progress in great detail, if you want.
1068-
1069-Decompression speed is unaffected by these phenomena.
1070-
1071-.I bzip2
1072-usually allocates several megabytes of memory to operate
1073-in, and then charges all over it in a fairly random fashion. This means
1074-that performance, both for compressing and decompressing, is largely
1075-determined by the speed at which your machine can service cache misses.
1076-Because of this, small changes to the code to reduce the miss rate have
1077-been observed to give disproportionately large performance improvements.
1078-I imagine
1079-.I bzip2
1080-will perform best on machines with very large caches.
1081-
1082-.SH CAVEATS
1083-I/O error messages are not as helpful as they could be.
1084-.I bzip2
1085-tries hard to detect I/O errors and exit cleanly, but the details of
1086-what the problem is sometimes seem rather misleading.
1087-
1088-This manual page pertains to version 1.0 of
1089-.I bzip2.
1090-Compressed
1091-data created by this version is entirely forwards and backwards
1092-compatible with the previous public releases, versions 0.1pl2, 0.9.0
1093-and 0.9.5,
1094-but with the following exception: 0.9.0 and above can correctly
1095-decompress multiple concatenated compressed files. 0.1pl2 cannot do
1096-this; it will stop after decompressing just the first file in the
1097-stream.
1098-
1099-.I bzip2recover
1100-uses 32-bit integers to represent bit positions in
1101-compressed files, so it cannot handle compressed files more than 512
1102-megabytes long. This could easily be fixed.
1103-
1104-.SH AUTHOR
1105-Julian Seward, jseward@acm.org.
1106-
1107-http://sourceware.cygnus.com/bzip2
1108-http://www.muraroa.demon.co.uk
1109-
1110-The ideas embodied in
1111-.I bzip2
1112-are due to (at least) the following
1113-people: Michael Burrows and David Wheeler (for the block sorting
1114-transformation), David Wheeler (again, for the Huffman coder), Peter
1115-Fenwick (for the structured coding model in the original
1116-.I bzip,
1117-and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
1118-(for the arithmetic coder in the original
1119-.I bzip).
1120-I am much
1121-indebted for their help, support and advice. See the manual in the
1122-source distribution for pointers to sources of documentation. Christian
1123-von Roques encouraged me to look for faster sorting algorithms, so as to
1124-speed up compression. Bela Lubkin encouraged me to improve the
1125-worst-case compression performance. Many people sent patches, helped
1126-with portability problems, lent machines, gave advice and were generally
1127-helpful.
1128diff -Nru bzip2-1.0.1/bzip2.1.preformatted bzip2-1.0.1.new/bzip2.1.preformatted
1129--- bzip2-1.0.1/bzip2.1.preformatted Sat Jun 24 20:13:27 2000
1130+++ bzip2-1.0.1.new/bzip2.1.preformatted Thu Jan 1 01:00:00 1970
1131@@ -1,462 +0,0 @@
1132-
1133-
1134-
1135-bzip2(1) bzip2(1)
1136-
1137-
1138-N\bNA\bAM\bME\bE
1139- bzip2, bunzip2 - a block-sorting file compressor, v1.0
1140- bzcat - decompresses files to stdout
1141- bzip2recover - recovers data from damaged bzip2 files
1142-
1143-
1144-S\bSY\bYN\bNO\bOP\bPS\bSI\bIS\bS
1145- b\bbz\bzi\bip\bp2\b2 [ -\b-c\bcd\bdf\bfk\bkq\bqs\bst\btv\bvz\bzV\bVL\bL1\b12\b23\b34\b45\b56\b67\b78\b89\b9 ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ]
1146- b\bbu\bun\bnz\bzi\bip\bp2\b2 [ -\b-f\bfk\bkv\bvs\bsV\bVL\bL ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ]
1147- b\bbz\bzc\bca\bat\bt [ -\b-s\bs ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ]
1148- b\bbz\bzi\bip\bp2\b2r\bre\bec\bco\bov\bve\ber\br _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be
1149-
1150-
1151-D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
1152- _\bb_\bz_\bi_\bp_\b2 compresses files using the Burrows-Wheeler block
1153- sorting text compression algorithm, and Huffman coding.
1154- Compression is generally considerably better than that
1155- achieved by more conventional LZ77/LZ78-based compressors,
1156- and approaches the performance of the PPM family of sta-
1157- tistical compressors.
1158-
1159- The command-line options are deliberately very similar to
1160- those of _\bG_\bN_\bU _\bg_\bz_\bi_\bp_\b, but they are not identical.
1161-
1162- _\bb_\bz_\bi_\bp_\b2 expects a list of file names to accompany the com-
1163- mand-line flags. Each file is replaced by a compressed
1164- version of itself, with the name "original_name.bz2".
1165- Each compressed file has the same modification date, per-
1166- missions, and, when possible, ownership as the correspond-
1167- ing original, so that these properties can be correctly
1168- restored at decompression time. File name handling is
1169- naive in the sense that there is no mechanism for preserv-
1170- ing original file names, permissions, ownerships or dates
1171- in filesystems which lack these concepts, or have serious
1172- file name length restrictions, such as MS-DOS.
1173-
1174- _\bb_\bz_\bi_\bp_\b2 and _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will by default not overwrite existing
1175- files. If you want this to happen, specify the -f flag.
1176-
1177- If no file names are specified, _\bb_\bz_\bi_\bp_\b2 compresses from
1178- standard input to standard output. In this case, _\bb_\bz_\bi_\bp_\b2
1179- will decline to write compressed output to a terminal, as
1180- this would be entirely incomprehensible and therefore
1181- pointless.
1182-
1183- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 (or _\bb_\bz_\bi_\bp_\b2 _\b-_\bd_\b) decompresses all specified files.
1184- Files which were not created by _\bb_\bz_\bi_\bp_\b2 will be detected and
1185- ignored, and a warning issued. _\bb_\bz_\bi_\bp_\b2 attempts to guess
1186- the filename for the decompressed file from that of the
1187- compressed file as follows:
1188-
1189- filename.bz2 becomes filename
1190- filename.bz becomes filename
1191- filename.tbz2 becomes filename.tar
1192-
1193-
1194-
1195- 1
1196-
1197-
1198-
1199-
1200-
1201-bzip2(1) bzip2(1)
1202-
1203-
1204- filename.tbz becomes filename.tar
1205- anyothername becomes anyothername.out
1206-
1207- If the file does not end in one of the recognised endings,
1208- _\b._\bb_\bz_\b2_\b, _\b._\bb_\bz_\b, _\b._\bt_\bb_\bz_\b2 or _\b._\bt_\bb_\bz_\b, _\bb_\bz_\bi_\bp_\b2 complains that it cannot
1209- guess the name of the original file, and uses the original
1210- name with _\b._\bo_\bu_\bt appended.
1211-
1212- As with compression, supplying no filenames causes decom-
1213- pression from standard input to standard output.
1214-
1215- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will correctly decompress a file which is the con-
1216- catenation of two or more compressed files. The result is
1217- the concatenation of the corresponding uncompressed files.
1218- Integrity testing (-t) of concatenated compressed files is
1219- also supported.
1220-
1221- You can also compress or decompress files to the standard
1222- output by giving the -c flag. Multiple files may be com-
1223- pressed and decompressed like this. The resulting outputs
1224- are fed sequentially to stdout. Compression of multiple
1225- files in this manner generates a stream containing multi-
1226- ple compressed file representations. Such a stream can be
1227- decompressed correctly only by _\bb_\bz_\bi_\bp_\b2 version 0.9.0 or
1228- later. Earlier versions of _\bb_\bz_\bi_\bp_\b2 will stop after decom-
1229- pressing the first file in the stream.
1230-
1231- _\bb_\bz_\bc_\ba_\bt (or _\bb_\bz_\bi_\bp_\b2 _\b-_\bd_\bc_\b) decompresses all specified files to
1232- the standard output.
1233-
1234- _\bb_\bz_\bi_\bp_\b2 will read arguments from the environment variables
1235- _\bB_\bZ_\bI_\bP_\b2 and _\bB_\bZ_\bI_\bP_\b, in that order, and will process them
1236- before any arguments read from the command line. This
1237- gives a convenient way to supply default arguments.
1238-
1239- Compression is always performed, even if the compressed
1240- file is slightly larger than the original. Files of less
1241- than about one hundred bytes tend to get larger, since the
1242- compression mechanism has a constant overhead in the
1243- region of 50 bytes. Random data (including the output of
1244- most file compressors) is coded at about 8.05 bits per
1245- byte, giving an expansion of around 0.5%.
1246-
1247- As a self-check for your protection, _\bb_\bz_\bi_\bp_\b2 uses 32-bit
1248- CRCs to make sure that the decompressed version of a file
1249- is identical to the original. This guards against corrup-
1250- tion of the compressed data, and against undetected bugs
1251- in _\bb_\bz_\bi_\bp_\b2 (hopefully very unlikely). The chances of data
1252- corruption going undetected is microscopic, about one
1253- chance in four billion for each file processed. Be aware,
1254- though, that the check occurs upon decompression, so it
1255- can only tell you that something is wrong. It can't help
1256- you recover the original uncompressed data. You can use
1257- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br to try to recover data from damaged files.
1258-
1259-
1260-
1261- 2
1262-
1263-
1264-
1265-
1266-
1267-bzip2(1) bzip2(1)
1268-
1269-
1270- Return values: 0 for a normal exit, 1 for environmental
1271- problems (file not found, invalid flags, I/O errors, &c),
1272- 2 to indicate a corrupt compressed file, 3 for an internal
1273- consistency error (eg, bug) which caused _\bb_\bz_\bi_\bp_\b2 to panic.
1274-
1275-
1276-O\bOP\bPT\bTI\bIO\bON\bNS\bS
1277- -\b-c\bc -\b--\b-s\bst\btd\bdo\bou\but\bt
1278- Compress or decompress to standard output.
1279-
1280- -\b-d\bd -\b--\b-d\bde\bec\bco\bom\bmp\bpr\bre\bes\bss\bs
1281- Force decompression. _\bb_\bz_\bi_\bp_\b2_\b, _\bb_\bu_\bn_\bz_\bi_\bp_\b2 and _\bb_\bz_\bc_\ba_\bt are
1282- really the same program, and the decision about
1283- what actions to take is done on the basis of which
1284- name is used. This flag overrides that mechanism,
1285- and forces _\bb_\bz_\bi_\bp_\b2 to decompress.
1286-
1287- -\b-z\bz -\b--\b-c\bco\bom\bmp\bpr\bre\bes\bss\bs
1288- The complement to -d: forces compression, regard-
1289- less of the invokation name.
1290-
1291- -\b-t\bt -\b--\b-t\bte\bes\bst\bt
1292- Check integrity of the specified file(s), but don't
1293- decompress them. This really performs a trial
1294- decompression and throws away the result.
1295-
1296- -\b-f\bf -\b--\b-f\bfo\bor\brc\bce\be
1297- Force overwrite of output files. Normally, _\bb_\bz_\bi_\bp_\b2
1298- will not overwrite existing output files. Also
1299- forces _\bb_\bz_\bi_\bp_\b2 to break hard links to files, which it
1300- otherwise wouldn't do.
1301-
1302- -\b-k\bk -\b--\b-k\bke\bee\bep\bp
1303- Keep (don't delete) input files during compression
1304- or decompression.
1305-
1306- -\b-s\bs -\b--\b-s\bsm\bma\bal\bll\bl
1307- Reduce memory usage, for compression, decompression
1308- and testing. Files are decompressed and tested
1309- using a modified algorithm which only requires 2.5
1310- bytes per block byte. This means any file can be
1311- decompressed in 2300k of memory, albeit at about
1312- half the normal speed.
1313-
1314- During compression, -s selects a block size of
1315- 200k, which limits memory use to around the same
1316- figure, at the expense of your compression ratio.
1317- In short, if your machine is low on memory (8
1318- megabytes or less), use -s for everything. See
1319- MEMORY MANAGEMENT below.
1320-
1321- -\b-q\bq -\b--\b-q\bqu\bui\bie\bet\bt
1322- Suppress non-essential warning messages. Messages
1323- pertaining to I/O errors and other critical events
1324-
1325-
1326-
1327- 3
1328-
1329-
1330-
1331-
1332-
1333-bzip2(1) bzip2(1)
1334-
1335-
1336- will not be suppressed.
1337-
1338- -\b-v\bv -\b--\b-v\bve\ber\brb\bbo\bos\bse\be
1339- Verbose mode -- show the compression ratio for each
1340- file processed. Further -v's increase the ver-
1341- bosity level, spewing out lots of information which
1342- is primarily of interest for diagnostic purposes.
1343-
1344- -\b-L\bL -\b--\b-l\bli\bic\bce\ben\bns\bse\be -\b-V\bV -\b--\b-v\bve\ber\brs\bsi\bio\bon\bn
1345- Display the software version, license terms and
1346- conditions.
1347-
1348- -\b-1\b1 t\bto\bo -\b-9\b9
1349- Set the block size to 100 k, 200 k .. 900 k when
1350- compressing. Has no effect when decompressing.
1351- See MEMORY MANAGEMENT below.
1352-
1353- -\b--\b- Treats all subsequent arguments as file names, even
1354- if they start with a dash. This is so you can han-
1355- dle files with names beginning with a dash, for
1356- example: bzip2 -- -myfilename.
1357-
1358- -\b--\b-r\bre\bep\bpe\bet\bti\bit\bti\biv\bve\be-\b-f\bfa\bas\bst\bt -\b--\b-r\bre\bep\bpe\bet\bti\bit\bti\biv\bve\be-\b-b\bbe\bes\bst\bt
1359- These flags are redundant in versions 0.9.5 and
1360- above. They provided some coarse control over the
1361- behaviour of the sorting algorithm in earlier ver-
1362- sions, which was sometimes useful. 0.9.5 and above
1363- have an improved algorithm which renders these
1364- flags irrelevant.
1365-
1366-
1367-M\bME\bEM\bMO\bOR\bRY\bY M\bMA\bAN\bNA\bAG\bGE\bEM\bME\bEN\bNT\bT
1368- _\bb_\bz_\bi_\bp_\b2 compresses large files in blocks. The block size
1369- affects both the compression ratio achieved, and the
1370- amount of memory needed for compression and decompression.
1371- The flags -1 through -9 specify the block size to be
1372- 100,000 bytes through 900,000 bytes (the default) respec-
1373- tively. At decompression time, the block size used for
1374- compression is read from the header of the compressed
1375- file, and _\bb_\bu_\bn_\bz_\bi_\bp_\b2 then allocates itself just enough memory
1376- to decompress the file. Since block sizes are stored in
1377- compressed files, it follows that the flags -1 to -9 are
1378- irrelevant to and so ignored during decompression.
1379-
1380- Compression and decompression requirements, in bytes, can
1381- be estimated as:
1382-
1383- Compression: 400k + ( 8 x block size )
1384-
1385- Decompression: 100k + ( 4 x block size ), or
1386- 100k + ( 2.5 x block size )
1387-
1388- Larger block sizes give rapidly diminishing marginal
1389- returns. Most of the compression comes from the first two
1390-
1391-
1392-
1393- 4
1394-
1395-
1396-
1397-
1398-
1399-bzip2(1) bzip2(1)
1400-
1401-
1402- or three hundred k of block size, a fact worth bearing in
1403- mind when using _\bb_\bz_\bi_\bp_\b2 on small machines. It is also
1404- important to appreciate that the decompression memory
1405- requirement is set at compression time by the choice of
1406- block size.
1407-
1408- For files compressed with the default 900k block size,
1409- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will require about 3700 kbytes to decompress. To
1410- support decompression of any file on a 4 megabyte machine,
1411- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 has an option to decompress using approximately
1412- half this amount of memory, about 2300 kbytes. Decompres-
1413- sion speed is also halved, so you should use this option
1414- only where necessary. The relevant flag is -s.
1415-
1416- In general, try and use the largest block size memory con-
1417- straints allow, since that maximises the compression
1418- achieved. Compression and decompression speed are virtu-
1419- ally unaffected by block size.
1420-
1421- Another significant point applies to files which fit in a
1422- single block -- that means most files you'd encounter
1423- using a large block size. The amount of real memory
1424- touched is proportional to the size of the file, since the
1425- file is smaller than a block. For example, compressing a
1426- file 20,000 bytes long with the flag -9 will cause the
1427- compressor to allocate around 7600k of memory, but only
1428- touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
1429- decompressor will allocate 3700k but only touch 100k +
1430- 20000 * 4 = 180 kbytes.
1431-
1432- Here is a table which summarises the maximum memory usage
1433- for different block sizes. Also recorded is the total
1434- compressed size for 14 files of the Calgary Text Compres-
1435- sion Corpus totalling 3,141,622 bytes. This column gives
1436- some feel for how compression varies with block size.
1437- These figures tend to understate the advantage of larger
1438- block sizes for larger files, since the Corpus is domi-
1439- nated by smaller files.
1440-
1441- Compress Decompress Decompress Corpus
1442- Flag usage usage -s usage Size
1443-
1444- -1 1200k 500k 350k 914704
1445- -2 2000k 900k 600k 877703
1446- -3 2800k 1300k 850k 860338
1447- -4 3600k 1700k 1100k 846899
1448- -5 4400k 2100k 1350k 845160
1449- -6 5200k 2500k 1600k 838626
1450- -7 6100k 2900k 1850k 834096
1451- -8 6800k 3300k 2100k 828642
1452- -9 7600k 3700k 2350k 828642
1453-
1454-
1455-
1456-
1457-
1458-
1459- 5
1460-
1461-
1462-
1463-
1464-
1465-bzip2(1) bzip2(1)
1466-
1467-
1468-R\bRE\bEC\bCO\bOV\bVE\bER\bRI\bIN\bNG\bG D\bDA\bAT\bTA\bA F\bFR\bRO\bOM\bM D\bDA\bAM\bMA\bAG\bGE\bED\bD F\bFI\bIL\bLE\bES\bS
1469- _\bb_\bz_\bi_\bp_\b2 compresses files in blocks, usually 900kbytes long.
1470- Each block is handled independently. If a media or trans-
1471- mission error causes a multi-block .bz2 file to become
1472- damaged, it may be possible to recover data from the
1473- undamaged blocks in the file.
1474-
1475- The compressed representation of each block is delimited
1476- by a 48-bit pattern, which makes it possible to find the
1477- block boundaries with reasonable certainty. Each block
1478- also carries its own 32-bit CRC, so damaged blocks can be
1479- distinguished from undamaged ones.
1480-
1481- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br is a simple program whose purpose is to
1482- search for blocks in .bz2 files, and write each block out
1483- into its own .bz2 file. You can then use _\bb_\bz_\bi_\bp_\b2 -t to test
1484- the integrity of the resulting files, and decompress those
1485- which are undamaged.
1486-
1487- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br takes a single argument, the name of the dam-
1488- aged file, and writes a number of files "rec0001file.bz2",
1489- "rec0002file.bz2", etc, containing the extracted blocks.
1490- The output filenames are designed so that the use of
1491- wildcards in subsequent processing -- for example, "bzip2
1492- -dc rec*file.bz2 > recovered_data" -- lists the files in
1493- the correct order.
1494-
1495- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br should be of most use dealing with large .bz2
1496- files, as these will contain many blocks. It is clearly
1497- futile to use it on damaged single-block files, since a
1498- damaged block cannot be recovered. If you wish to min-
1499- imise any potential data loss through media or transmis-
1500- sion errors, you might consider compressing with a smaller
1501- block size.
1502-
1503-
1504-P\bPE\bER\bRF\bFO\bOR\bRM\bMA\bAN\bNC\bCE\bE N\bNO\bOT\bTE\bES\bS
1505- The sorting phase of compression gathers together similar
1506- strings in the file. Because of this, files containing
1507- very long runs of repeated symbols, like "aabaabaabaab
1508- ..." (repeated several hundred times) may compress more
1509- slowly than normal. Versions 0.9.5 and above fare much
1510- better than previous versions in this respect. The ratio
1511- between worst-case and average-case compression time is in
1512- the region of 10:1. For previous versions, this figure
1513- was more like 100:1. You can use the -vvvv option to mon-
1514- itor progress in great detail, if you want.
1515-
1516- Decompression speed is unaffected by these phenomena.
1517-
1518- _\bb_\bz_\bi_\bp_\b2 usually allocates several megabytes of memory to
1519- operate in, and then charges all over it in a fairly ran-
1520- dom fashion. This means that performance, both for com-
1521- pressing and decompressing, is largely determined by the
1522-
1523-
1524-
1525- 6
1526-
1527-
1528-
1529-
1530-
1531-bzip2(1) bzip2(1)
1532-
1533-
1534- speed at which your machine can service cache misses.
1535- Because of this, small changes to the code to reduce the
1536- miss rate have been observed to give disproportionately
1537- large performance improvements. I imagine _\bb_\bz_\bi_\bp_\b2 will per-
1538- form best on machines with very large caches.
1539-
1540-
1541-C\bCA\bAV\bVE\bEA\bAT\bTS\bS
1542- I/O error messages are not as helpful as they could be.
1543- _\bb_\bz_\bi_\bp_\b2 tries hard to detect I/O errors and exit cleanly,
1544- but the details of what the problem is sometimes seem
1545- rather misleading.
1546-
1547- This manual page pertains to version 1.0 of _\bb_\bz_\bi_\bp_\b2_\b. Com-
1548- pressed data created by this version is entirely forwards
1549- and backwards compatible with the previous public
1550- releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
1551- following exception: 0.9.0 and above can correctly decom-
1552- press multiple concatenated compressed files. 0.1pl2 can-
1553- not do this; it will stop after decompressing just the
1554- first file in the stream.
1555-
1556- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br uses 32-bit integers to represent bit posi-
1557- tions in compressed files, so it cannot handle compressed
1558- files more than 512 megabytes long. This could easily be
1559- fixed.
1560-
1561-
1562-A\bAU\bUT\bTH\bHO\bOR\bR
1563- Julian Seward, jseward@acm.org.
1564-
1565- http://sourceware.cygnus.com/bzip2
1566- http://www.muraroa.demon.co.uk
1567-
1568- The ideas embodied in _\bb_\bz_\bi_\bp_\b2 are due to (at least) the fol-
1569- lowing people: Michael Burrows and David Wheeler (for the
1570- block sorting transformation), David Wheeler (again, for
1571- the Huffman coder), Peter Fenwick (for the structured cod-
1572- ing model in the original _\bb_\bz_\bi_\bp_\b, and many refinements), and
1573- Alistair Moffat, Radford Neal and Ian Witten (for the
1574- arithmetic coder in the original _\bb_\bz_\bi_\bp_\b)_\b. I am much
1575- indebted for their help, support and advice. See the man-
1576- ual in the source distribution for pointers to sources of
1577- documentation. Christian von Roques encouraged me to look
1578- for faster sorting algorithms, so as to speed up compres-
1579- sion. Bela Lubkin encouraged me to improve the worst-case
1580- compression performance. Many people sent patches, helped
1581- with portability problems, lent machines, gave advice and
1582- were generally helpful.
1583-
1584-
1585-
1586-
1587-
1588-
1589-
1590-
1591- 7
1592-
1593-
1594diff -Nru bzip2-1.0.1/bzless bzip2-1.0.1.new/bzless
1595--- bzip2-1.0.1/bzless Thu Jan 1 01:00:00 1970
1596+++ bzip2-1.0.1.new/bzless Sat Jun 24 20:16:09 2000
1597@@ -0,0 +1,2 @@
1598+#!/bin/sh
1599+%{_bindir}/bunzip2 -c "\$@" | /usr/bin/less
1600diff -Nru bzip2-1.0.1/config.h.in bzip2-1.0.1.new/config.h.in
1601--- bzip2-1.0.1/config.h.in Thu Jan 1 01:00:00 1970
1602+++ bzip2-1.0.1.new/config.h.in Sat Jun 24 20:13:06 2000
1603@@ -0,0 +1,17 @@
1604+/* config.h.in. Generated automatically from configure.in by autoheader. */
1605+
1606+/* Name of package */
1607+#undef PACKAGE
1608+
1609+/* Version number of package */
1610+#undef VERSION
1611+
1612+/* Number of bits in a file offset, on hosts where this is settable. */
1613+#undef _FILE_OFFSET_BITS
1614+
1615+/* Define to make fseeko etc. visible, on some hosts. */
1616+#undef _LARGEFILE_SOURCE
1617+
1618+/* Define for large files, on AIX-style hosts. */
1619+#undef _LARGE_FILES
1620+
1621diff -Nru bzip2-1.0.1/configure.in bzip2-1.0.1.new/configure.in
1622--- bzip2-1.0.1/configure.in Thu Jan 1 01:00:00 1970
1623+++ bzip2-1.0.1.new/configure.in Sat Jun 24 20:13:06 2000
1624@@ -0,0 +1,10 @@
1625+AC_INIT(bzip2.c)
1626+AM_INIT_AUTOMAKE(bzip2,1.0.1)
1627+AM_CONFIG_HEADER(config.h)
1628+AC_PROG_CC
1629+AM_PROG_LIBTOOL
1630+AC_PROG_LN_S
1631+AC_SYS_LARGEFILE
1632+AC_OUTPUT(Makefile
1633+ doc/Makefile
1634+ doc/pl/Makefile)
1635diff -Nru bzip2-1.0.1/crctable.c bzip2-1.0.1.new/crctable.c
1636--- bzip2-1.0.1/crctable.c Sat Jun 24 20:13:27 2000
1637+++ bzip2-1.0.1.new/crctable.c Sat Jun 24 20:13:06 2000
1638@@ -58,6 +58,10 @@
1639 For more information on these sources, see the manual.
1640 --*/
1641
1642+#ifdef HAVE_CONFIG_H
1643+#include <config.h>
1644+#endif
1645+
1646
1647 #include "bzlib_private.h"
1648
1649diff -Nru bzip2-1.0.1/decompress.c bzip2-1.0.1.new/decompress.c
1650--- bzip2-1.0.1/decompress.c Sat Jun 24 20:13:27 2000
1651+++ bzip2-1.0.1.new/decompress.c Sat Jun 24 20:13:06 2000
1652@@ -58,6 +58,10 @@
1653 For more information on these sources, see the manual.
1654 --*/
1655
1656+#ifdef HAVE_CONFIG_H
1657+#include <config.h>
1658+#endif
1659+
1660
1661 #include "bzlib_private.h"
1662
1663diff -Nru bzip2-1.0.1/dlltest.c bzip2-1.0.1.new/dlltest.c
1664--- bzip2-1.0.1/dlltest.c Sat Jun 24 20:13:27 2000
1665+++ bzip2-1.0.1.new/dlltest.c Sat Jun 24 20:13:06 2000
1666@@ -8,6 +8,10 @@
1667 usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]\r
1668 */\r
1669 \r
1670+#ifdef HAVE_CONFIG_H
1671+#include <config.h>
1672+#endif
1673+
1674 #define BZ_IMPORT\r
1675 #include <stdio.h>\r
1676 #include <stdlib.h>\r
1677diff -Nru bzip2-1.0.1/doc/Makefile.am bzip2-1.0.1.new/doc/Makefile.am
1678--- bzip2-1.0.1/doc/Makefile.am Thu Jan 1 01:00:00 1970
1679+++ bzip2-1.0.1.new/doc/Makefile.am Sat Jun 24 20:14:43 2000
1680@@ -0,0 +1,5 @@
1681+
1682+SUBDIRS = pl
1683+
1684+man_MANS = bzip2.1 bunzip2.1 bzcat.1 bzip2recover.1
1685+#info_TEXINFOS = bzip2.texi
1686diff -Nru bzip2-1.0.1/doc/bunzip2.1 bzip2-1.0.1.new/doc/bunzip2.1
1687--- bzip2-1.0.1/doc/bunzip2.1 Thu Jan 1 01:00:00 1970
1688+++ bzip2-1.0.1.new/doc/bunzip2.1 Sat Jun 24 20:13:06 2000
1689@@ -0,0 +1 @@
1690+.so bzip2.1
1691\ No newline at end of file
1692diff -Nru bzip2-1.0.1/doc/bzcat.1 bzip2-1.0.1.new/doc/bzcat.1
1693--- bzip2-1.0.1/doc/bzcat.1 Thu Jan 1 01:00:00 1970
1694+++ bzip2-1.0.1.new/doc/bzcat.1 Sat Jun 24 20:13:06 2000
1695@@ -0,0 +1 @@
1696+.so bzip2.1
1697\ No newline at end of file
1698diff -Nru bzip2-1.0.1/doc/bzip2.1 bzip2-1.0.1.new/doc/bzip2.1
1699--- bzip2-1.0.1/doc/bzip2.1 Thu Jan 1 01:00:00 1970
1700+++ bzip2-1.0.1.new/doc/bzip2.1 Sat Jun 24 20:13:06 2000
1701@@ -0,0 +1,439 @@
1702+.PU
1703+.TH bzip2 1
1704+.SH NAME
1705+bzip2, bunzip2 \- a block-sorting file compressor, v1.0
1706+.br
1707+bzcat \- decompresses files to stdout
1708+.br
1709+bzip2recover \- recovers data from damaged bzip2 files
1710+
1711+.SH SYNOPSIS
1712+.ll +8
1713+.B bzip2
1714+.RB [ " \-cdfkqstvzVL123456789 " ]
1715+[
1716+.I "filenames \&..."
1717+]
1718+.ll -8
1719+.br
1720+.B bunzip2
1721+.RB [ " \-fkvsVL " ]
1722+[
1723+.I "filenames \&..."
1724+]
1725+.br
1726+.B bzcat
1727+.RB [ " \-s " ]
1728+[
1729+.I "filenames \&..."
1730+]
1731+.br
1732+.B bzip2recover
1733+.I "filename"
1734+
1735+.SH DESCRIPTION
1736+.I bzip2
1737+compresses files using the Burrows-Wheeler block sorting
1738+text compression algorithm, and Huffman coding. Compression is
1739+generally considerably better than that achieved by more conventional
1740+LZ77/LZ78-based compressors, and approaches the performance of the PPM
1741+family of statistical compressors.
1742+
1743+The command-line options are deliberately very similar to
1744+those of
1745+.I GNU gzip,
1746+but they are not identical.
1747+
1748+.I bzip2
1749+expects a list of file names to accompany the
1750+command-line flags. Each file is replaced by a compressed version of
1751+itself, with the name "original_name.bz2".
1752+Each compressed file
1753+has the same modification date, permissions, and, when possible,
1754+ownership as the corresponding original, so that these properties can
1755+be correctly restored at decompression time. File name handling is
1756+naive in the sense that there is no mechanism for preserving original
1757+file names, permissions, ownerships or dates in filesystems which lack
1758+these concepts, or have serious file name length restrictions, such as
1759+MS-DOS.
1760+
1761+.I bzip2
1762+and
1763+.I bunzip2
1764+will by default not overwrite existing
1765+files. If you want this to happen, specify the \-f flag.
1766+
1767+If no file names are specified,
1768+.I bzip2
1769+compresses from standard
1770+input to standard output. In this case,
1771+.I bzip2
1772+will decline to
1773+write compressed output to a terminal, as this would be entirely
1774+incomprehensible and therefore pointless.
1775+
1776+.I bunzip2
1777+(or
1778+.I bzip2 \-d)
1779+decompresses all
1780+specified files. Files which were not created by
1781+.I bzip2
1782+will be detected and ignored, and a warning issued.
1783+.I bzip2
1784+attempts to guess the filename for the decompressed file
1785+from that of the compressed file as follows:
1786+
1787+ filename.bz2 becomes filename
1788+ filename.bz becomes filename
1789+ filename.tbz2 becomes filename.tar
1790+ filename.tbz becomes filename.tar
1791+ anyothername becomes anyothername.out
1792+
1793+If the file does not end in one of the recognised endings,
1794+.I .bz2,
1795+.I .bz,
1796+.I .tbz2
1797+or
1798+.I .tbz,
1799+.I bzip2
1800+complains that it cannot
1801+guess the name of the original file, and uses the original name
1802+with
1803+.I .out
1804+appended.
1805+
1806+As with compression, supplying no
1807+filenames causes decompression from
1808+standard input to standard output.
1809+
1810+.I bunzip2
1811+will correctly decompress a file which is the
1812+concatenation of two or more compressed files. The result is the
1813+concatenation of the corresponding uncompressed files. Integrity
1814+testing (\-t)
1815+of concatenated
1816+compressed files is also supported.
1817+
1818+You can also compress or decompress files to the standard output by
1819+giving the \-c flag. Multiple files may be compressed and
1820+decompressed like this. The resulting outputs are fed sequentially to
1821+stdout. Compression of multiple files
1822+in this manner generates a stream
1823+containing multiple compressed file representations. Such a stream
1824+can be decompressed correctly only by
1825+.I bzip2
1826+version 0.9.0 or
1827+later. Earlier versions of
1828+.I bzip2
1829+will stop after decompressing
1830+the first file in the stream.
1831+
1832+.I bzcat
1833+(or
1834+.I bzip2 -dc)
1835+decompresses all specified files to
1836+the standard output.
1837+
1838+.I bzip2
1839+will read arguments from the environment variables
1840+.I BZIP2
1841+and
1842+.I BZIP,
1843+in that order, and will process them
1844+before any arguments read from the command line. This gives a
1845+convenient way to supply default arguments.
1846+
1847+Compression is always performed, even if the compressed
1848+file is slightly
1849+larger than the original. Files of less than about one hundred bytes
1850+tend to get larger, since the compression mechanism has a constant
1851+overhead in the region of 50 bytes. Random data (including the output
1852+of most file compressors) is coded at about 8.05 bits per byte, giving
1853+an expansion of around 0.5%.
1854+
1855+As a self-check for your protection,
1856+.I
1857+bzip2
1858+uses 32-bit CRCs to
1859+make sure that the decompressed version of a file is identical to the
1860+original. This guards against corruption of the compressed data, and
1861+against undetected bugs in
1862+.I bzip2
1863+(hopefully very unlikely). The
1864+chances of data corruption going undetected is microscopic, about one
1865+chance in four billion for each file processed. Be aware, though, that
1866+the check occurs upon decompression, so it can only tell you that
1867+something is wrong. It can't help you
1868+recover the original uncompressed
1869+data. You can use
1870+.I bzip2recover
1871+to try to recover data from
1872+damaged files.
1873+
1874+Return values: 0 for a normal exit, 1 for environmental problems (file
1875+not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
1876+compressed file, 3 for an internal consistency error (eg, bug) which
1877+caused
1878+.I bzip2
1879+to panic.
1880+
1881+.SH OPTIONS
1882+.TP
1883+.B \-c --stdout
1884+Compress or decompress to standard output.
1885+.TP
1886+.B \-d --decompress
1887+Force decompression.
1888+.I bzip2,
1889+.I bunzip2
1890+and
1891+.I bzcat
1892+are
1893+really the same program, and the decision about what actions to take is
1894+done on the basis of which name is used. This flag overrides that
1895+mechanism, and forces
1896+.I bzip2
1897+to decompress.
1898+.TP
1899+.B \-z --compress
1900+The complement to \-d: forces compression, regardless of the
1901+invokation name.
1902+.TP
1903+.B \-t --test
1904+Check integrity of the specified file(s), but don't decompress them.
1905+This really performs a trial decompression and throws away the result.
1906+.TP
1907+.B \-f --force
1908+Force overwrite of output files. Normally,
1909+.I bzip2
1910+will not overwrite
1911+existing output files. Also forces
1912+.I bzip2
1913+to break hard links
1914+to files, which it otherwise wouldn't do.
1915+.TP
1916+.B \-k --keep
1917+Keep (don't delete) input files during compression
1918+or decompression.
1919+.TP
1920+.B \-s --small
1921+Reduce memory usage, for compression, decompression and testing. Files
1922+are decompressed and tested using a modified algorithm which only
1923+requires 2.5 bytes per block byte. This means any file can be
1924+decompressed in 2300k of memory, albeit at about half the normal speed.
1925+
1926+During compression, \-s selects a block size of 200k, which limits
1927+memory use to around the same figure, at the expense of your compression
1928+ratio. In short, if your machine is low on memory (8 megabytes or
1929+less), use \-s for everything. See MEMORY MANAGEMENT below.
1930+.TP
1931+.B \-q --quiet
1932+Suppress non-essential warning messages. Messages pertaining to
1933+I/O errors and other critical events will not be suppressed.
1934+.TP
1935+.B \-v --verbose
1936+Verbose mode -- show the compression ratio for each file processed.
1937+Further \-v's increase the verbosity level, spewing out lots of
1938+information which is primarily of interest for diagnostic purposes.
1939+.TP
1940+.B \-L --license -V --version
1941+Display the software version, license terms and conditions.
1942+.TP
1943+.B \-1 to \-9
1944+Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
1945+effect when decompressing. See MEMORY MANAGEMENT below.
1946+.TP
1947+.B \--
1948+Treats all subsequent arguments as file names, even if they start
1949+with a dash. This is so you can handle files with names beginning
1950+with a dash, for example: bzip2 \-- \-myfilename.
1951+.TP
1952+.B \--repetitive-fast --repetitive-best
1953+These flags are redundant in versions 0.9.5 and above. They provided
1954+some coarse control over the behaviour of the sorting algorithm in
1955+earlier versions, which was sometimes useful. 0.9.5 and above have an
1956+improved algorithm which renders these flags irrelevant.
1957+
1958+.SH MEMORY MANAGEMENT
1959+.I bzip2
1960+compresses large files in blocks. The block size affects
1961+both the compression ratio achieved, and the amount of memory needed for
1962+compression and decompression. The flags \-1 through \-9
1963+specify the block size to be 100,000 bytes through 900,000 bytes (the
1964+default) respectively. At decompression time, the block size used for
1965+compression is read from the header of the compressed file, and
1966+.I bunzip2
1967+then allocates itself just enough memory to decompress
1968+the file. Since block sizes are stored in compressed files, it follows
1969+that the flags \-1 to \-9 are irrelevant to and so ignored
1970+during decompression.
1971+
1972+Compression and decompression requirements,
1973+in bytes, can be estimated as:
1974+
1975+ Compression: 400k + ( 8 x block size )
1976+
1977+ Decompression: 100k + ( 4 x block size ), or
1978+ 100k + ( 2.5 x block size )
1979+
1980+Larger block sizes give rapidly diminishing marginal returns. Most of
1981+the compression comes from the first two or three hundred k of block
1982+size, a fact worth bearing in mind when using
1983+.I bzip2
1984+on small machines.
1985+It is also important to appreciate that the decompression memory
1986+requirement is set at compression time by the choice of block size.
1987+
1988+For files compressed with the default 900k block size,
1989+.I bunzip2
1990+will require about 3700 kbytes to decompress. To support decompression
1991+of any file on a 4 megabyte machine,
1992+.I bunzip2
1993+has an option to
1994+decompress using approximately half this amount of memory, about 2300
1995+kbytes. Decompression speed is also halved, so you should use this
1996+option only where necessary. The relevant flag is -s.
1997+
1998+In general, try and use the largest block size memory constraints allow,
1999+since that maximises the compression achieved. Compression and
2000+decompression speed are virtually unaffected by block size.
2001+
2002+Another significant point applies to files which fit in a single block
2003+-- that means most files you'd encounter using a large block size. The
2004+amount of real memory touched is proportional to the size of the file,
2005+since the file is smaller than a block. For example, compressing a file
2006+20,000 bytes long with the flag -9 will cause the compressor to
2007+allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
2008+kbytes of it. Similarly, the decompressor will allocate 3700k but only
2009+touch 100k + 20000 * 4 = 180 kbytes.
2010+
2011+Here is a table which summarises the maximum memory usage for different
2012+block sizes. Also recorded is the total compressed size for 14 files of
2013+the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
2014+column gives some feel for how compression varies with block size.
2015+These figures tend to understate the advantage of larger block sizes for
2016+larger files, since the Corpus is dominated by smaller files.
2017+
2018+ Compress Decompress Decompress Corpus
2019+ Flag usage usage -s usage Size
2020+
2021+ -1 1200k 500k 350k 914704
2022+ -2 2000k 900k 600k 877703
2023+ -3 2800k 1300k 850k 860338
2024+ -4 3600k 1700k 1100k 846899
2025+ -5 4400k 2100k 1350k 845160
2026+ -6 5200k 2500k 1600k 838626
2027+ -7 6100k 2900k 1850k 834096
2028+ -8 6800k 3300k 2100k 828642
2029+ -9 7600k 3700k 2350k 828642
2030+
2031+.SH RECOVERING DATA FROM DAMAGED FILES
2032+.I bzip2
2033+compresses files in blocks, usually 900kbytes long. Each
2034+block is handled independently. If a media or transmission error causes
2035+a multi-block .bz2
2036+file to become damaged, it may be possible to
2037+recover data from the undamaged blocks in the file.
2038+
2039+The compressed representation of each block is delimited by a 48-bit
2040+pattern, which makes it possible to find the block boundaries with
2041+reasonable certainty. Each block also carries its own 32-bit CRC, so
2042+damaged blocks can be distinguished from undamaged ones.
2043+
2044+.I bzip2recover
2045+is a simple program whose purpose is to search for
2046+blocks in .bz2 files, and write each block out into its own .bz2
2047+file. You can then use
2048+.I bzip2
2049+\-t
2050+to test the
2051+integrity of the resulting files, and decompress those which are
2052+undamaged.
2053+
2054+.I bzip2recover
2055+takes a single argument, the name of the damaged file,
2056+and writes a number of files "rec0001file.bz2",
2057+"rec0002file.bz2", etc, containing the extracted blocks.
2058+The output filenames are designed so that the use of
2059+wildcards in subsequent processing -- for example,
2060+"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
2061+the correct order.
2062+
2063+.I bzip2recover
2064+should be of most use dealing with large .bz2
2065+files, as these will contain many blocks. It is clearly
2066+futile to use it on damaged single-block files, since a
2067+damaged block cannot be recovered. If you wish to minimise
2068+any potential data loss through media or transmission errors,
2069+you might consider compressing with a smaller
2070+block size.
2071+
2072+.SH PERFORMANCE NOTES
2073+The sorting phase of compression gathers together similar strings in the
2074+file. Because of this, files containing very long runs of repeated
2075+symbols, like "aabaabaabaab ..." (repeated several hundred times) may
2076+compress more slowly than normal. Versions 0.9.5 and above fare much
2077+better than previous versions in this respect. The ratio between
2078+worst-case and average-case compression time is in the region of 10:1.
2079+For previous versions, this figure was more like 100:1. You can use the
2080+\-vvvv option to monitor progress in great detail, if you want.
2081+
2082+Decompression speed is unaffected by these phenomena.
2083+
2084+.I bzip2
2085+usually allocates several megabytes of memory to operate
2086+in, and then charges all over it in a fairly random fashion. This means
2087+that performance, both for compressing and decompressing, is largely
2088+determined by the speed at which your machine can service cache misses.
2089+Because of this, small changes to the code to reduce the miss rate have
2090+been observed to give disproportionately large performance improvements.
2091+I imagine
2092+.I bzip2
2093+will perform best on machines with very large caches.
2094+
2095+.SH CAVEATS
2096+I/O error messages are not as helpful as they could be.
2097+.I bzip2
2098+tries hard to detect I/O errors and exit cleanly, but the details of
2099+what the problem is sometimes seem rather misleading.
2100+
2101+This manual page pertains to version 1.0 of
2102+.I bzip2.
2103+Compressed
2104+data created by this version is entirely forwards and backwards
2105+compatible with the previous public releases, versions 0.1pl2, 0.9.0
2106+and 0.9.5,
2107+but with the following exception: 0.9.0 and above can correctly
2108+decompress multiple concatenated compressed files. 0.1pl2 cannot do
2109+this; it will stop after decompressing just the first file in the
2110+stream.
2111+
2112+.I bzip2recover
2113+uses 32-bit integers to represent bit positions in
2114+compressed files, so it cannot handle compressed files more than 512
2115+megabytes long. This could easily be fixed.
2116+
2117+.SH AUTHOR
2118+Julian Seward, jseward@acm.org.
2119+
2120+http://sourceware.cygnus.com/bzip2
2121+http://www.muraroa.demon.co.uk
2122+
2123+The ideas embodied in
2124+.I bzip2
2125+are due to (at least) the following
2126+people: Michael Burrows and David Wheeler (for the block sorting
2127+transformation), David Wheeler (again, for the Huffman coder), Peter
2128+Fenwick (for the structured coding model in the original
2129+.I bzip,
2130+and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
2131+(for the arithmetic coder in the original
2132+.I bzip).
2133+I am much
2134+indebted for their help, support and advice. See the manual in the
2135+source distribution for pointers to sources of documentation. Christian
2136+von Roques encouraged me to look for faster sorting algorithms, so as to
2137+speed up compression. Bela Lubkin encouraged me to improve the
2138+worst-case compression performance. Many people sent patches, helped
2139+with portability problems, lent machines, gave advice and were generally
2140+helpful.
2141diff -Nru bzip2-1.0.1/doc/bzip2.texi bzip2-1.0.1.new/doc/bzip2.texi
2142--- bzip2-1.0.1/doc/bzip2.texi Thu Jan 1 01:00:00 1970
2143+++ bzip2-1.0.1.new/doc/bzip2.texi Sat Jun 24 20:13:06 2000
2144@@ -0,0 +1,2217 @@
2145+\input texinfo @c -*- Texinfo -*-
2146+@setfilename bzip2.info
2147+
2148+@ignore
2149+This file documents bzip2 version 1.0, and associated library
2150+libbzip2, written by Julian Seward (jseward@acm.org).
2151+
2152+Copyright (C) 1996-2000 Julian R Seward
2153+
2154+Permission is granted to make and distribute verbatim copies of
2155+this manual provided the copyright notice and this permission notice
2156+are preserved on all copies.
2157+
2158+Permission is granted to copy and distribute translations of this manual
2159+into another language, under the above conditions for verbatim copies.
2160+@end ignore
2161+
2162+@ifinfo
2163+@format
2164+@dircategory File utilities:
2165+* Bzip2: (bzip2). A program and library for data
2166+ compression
2167+@end direntry
2168+@end format
2169+@end ifinfo
2170+
2171+@iftex
2172+@c @finalout
2173+@settitle bzip2 and libbzip2
2174+@titlepage
2175+@title bzip2 and libbzip2
2176+@subtitle a program and library for data compression
2177+@subtitle copyright (C) 1996-2000 Julian Seward
2178+@subtitle version 1.0 of 21 March 2000
2179+@author Julian Seward
2180+
2181+@end titlepage
2182+
2183+@parindent 0mm
2184+@parskip 2mm
2185+
2186+@end iftex
2187+@node Top, Overview, (dir), (dir)
2188+
2189+@top bzip2
2190+
2191+This program, @code{bzip2},
2192+and associated library @code{libbzip2}, are
2193+Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
2194+
2195+Redistribution and use in source and binary forms, with or without
2196+modification, are permitted provided that the following conditions
2197+are met:
2198+@itemize @bullet
2199+@item
2200+ Redistributions of source code must retain the above copyright
2201+ notice, this list of conditions and the following disclaimer.
2202+@item
2203+ The origin of this software must not be misrepresented; you must
2204+ not claim that you wrote the original software. If you use this
2205+ software in a product, an acknowledgment in the product
2206+ documentation would be appreciated but is not required.
2207+@item
2208+ Altered source versions must be plainly marked as such, and must
2209+ not be misrepresented as being the original software.
2210+@item
2211+ The name of the author may not be used to endorse or promote
2212+ products derived from this software without specific prior written
2213+ permission.
2214+@end itemize
2215+THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
2216+OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2217+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
2218+ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
2219+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
2220+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
2221+GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
2222+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
2223+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
2224+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
2225+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2226+
2227+Julian Seward, Cambridge, UK.
2228+
2229+@code{jseward@@acm.org}
2230+
2231+@code{http://sourceware.cygnus.com/bzip2}
2232+
2233+@code{http://www.cacheprof.org}
2234+
2235+@code{http://www.muraroa.demon.co.uk}
2236+
2237+@code{bzip2}/@code{libbzip2} version 1.0 of 21 March 2000.
2238+
2239+PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
2240+algorithms. However, I do not have the resources available to carry out
2241+a full patent search. Therefore I cannot give any guarantee of the
2242+above statement.
2243+
2244+
2245+
2246+
2247+
2248+
2249+
2250+@node Overview, Implementation, Top, Top
2251+@chapter Introduction
2252+
2253+@code{bzip2} compresses files using the Burrows-Wheeler
2254+block-sorting text compression algorithm, and Huffman coding.
2255+Compression is generally considerably better than that
2256+achieved by more conventional LZ77/LZ78-based compressors,
2257+and approaches the performance of the PPM family of statistical compressors.
2258+
2259+@code{bzip2} is built on top of @code{libbzip2}, a flexible library
2260+for handling compressed data in the @code{bzip2} format. This manual
2261+describes both how to use the program and
2262+how to work with the library interface. Most of the
2263+manual is devoted to this library, not the program,
2264+which is good news if your interest is only in the program.
2265+
2266+Chapter 2 describes how to use @code{bzip2}; this is the only part
2267+you need to read if you just want to know how to operate the program.
2268+Chapter 3 describes the programming interfaces in detail, and
2269+Chapter 4 records some miscellaneous notes which I thought
2270+ought to be recorded somewhere.
2271+
2272+
2273+@chapter How to use @code{bzip2}
2274+
2275+This chapter contains a copy of the @code{bzip2} man page,
2276+and nothing else.
2277+
2278+@quotation
2279+
2280+@unnumberedsubsubsec NAME
2281+@itemize
2282+@item @code{bzip2}, @code{bunzip2}
2283+- a block-sorting file compressor, v1.0
2284+@item @code{bzcat}
2285+- decompresses files to stdout
2286+@item @code{bzip2recover}
2287+- recovers data from damaged bzip2 files
2288+@end itemize
2289+
2290+@unnumberedsubsubsec SYNOPSIS
2291+@itemize
2292+@item @code{bzip2} [ -cdfkqstvzVL123456789 ] [ filenames ... ]
2293+@item @code{bunzip2} [ -fkvsVL ] [ filenames ... ]
2294+@item @code{bzcat} [ -s ] [ filenames ... ]
2295+@item @code{bzip2recover} filename
2296+@end itemize
2297+
2298+@unnumberedsubsubsec DESCRIPTION
2299+
2300+@code{bzip2} compresses files using the Burrows-Wheeler block sorting
2301+text compression algorithm, and Huffman coding. Compression is
2302+generally considerably better than that achieved by more conventional
2303+LZ77/LZ78-based compressors, and approaches the performance of the PPM
2304+family of statistical compressors.
2305+
2306+The command-line options are deliberately very similar to those of GNU
2307+@code{gzip}, but they are not identical.
2308+
2309+@code{bzip2} expects a list of file names to accompany the command-line
2310+flags. Each file is replaced by a compressed version of itself, with
2311+the name @code{original_name.bz2}. Each compressed file has the same
2312+modification date, permissions, and, when possible, ownership as the
2313+corresponding original, so that these properties can be correctly
2314+restored at decompression time. File name handling is naive in the
2315+sense that there is no mechanism for preserving original file names,
2316+permissions, ownerships or dates in filesystems which lack these
2317+concepts, or have serious file name length restrictions, such as MS-DOS.
2318+
2319+@code{bzip2} and @code{bunzip2} will by default not overwrite existing
2320+files. If you want this to happen, specify the @code{-f} flag.
2321+
2322+If no file names are specified, @code{bzip2} compresses from standard
2323+input to standard output. In this case, @code{bzip2} will decline to
2324+write compressed output to a terminal, as this would be entirely
2325+incomprehensible and therefore pointless.
2326+
2327+@code{bunzip2} (or @code{bzip2 -d}) decompresses all
2328+specified files. Files which were not created by @code{bzip2}
2329+will be detected and ignored, and a warning issued.
2330+@code{bzip2} attempts to guess the filename for the decompressed file
2331+from that of the compressed file as follows:
2332+@itemize
2333+@item @code{filename.bz2 } becomes @code{filename}
2334+@item @code{filename.bz } becomes @code{filename}
2335+@item @code{filename.tbz2} becomes @code{filename.tar}
2336+@item @code{filename.tbz } becomes @code{filename.tar}
2337+@item @code{anyothername } becomes @code{anyothername.out}
2338+@end itemize
2339+If the file does not end in one of the recognised endings,
2340+@code{.bz2}, @code{.bz},
2341+@code{.tbz2} or @code{.tbz}, @code{bzip2} complains that it cannot
2342+guess the name of the original file, and uses the original name
2343+with @code{.out} appended.
2344+
2345+As with compression, supplying no
2346+filenames causes decompression from standard input to standard output.
2347+
2348+@code{bunzip2} will correctly decompress a file which is the
2349+concatenation of two or more compressed files. The result is the
2350+concatenation of the corresponding uncompressed files. Integrity
2351+testing (@code{-t}) of concatenated compressed files is also supported.
2352+
2353+You can also compress or decompress files to the standard output by
2354+giving the @code{-c} flag. Multiple files may be compressed and
2355+decompressed like this. The resulting outputs are fed sequentially to
2356+stdout. Compression of multiple files in this manner generates a stream
2357+containing multiple compressed file representations. Such a stream
2358+can be decompressed correctly only by @code{bzip2} version 0.9.0 or
2359+later. Earlier versions of @code{bzip2} will stop after decompressing
2360+the first file in the stream.
2361+
2362+@code{bzcat} (or @code{bzip2 -dc}) decompresses all specified files to
2363+the standard output.
2364+
2365+@code{bzip2} will read arguments from the environment variables
2366+@code{BZIP2} and @code{BZIP}, in that order, and will process them
2367+before any arguments read from the command line. This gives a
2368+convenient way to supply default arguments.
2369+
2370+Compression is always performed, even if the compressed file is slightly
2371+larger than the original. Files of less than about one hundred bytes
2372+tend to get larger, since the compression mechanism has a constant
2373+overhead in the region of 50 bytes. Random data (including the output
2374+of most file compressors) is coded at about 8.05 bits per byte, giving
2375+an expansion of around 0.5%.
2376+
2377+As a self-check for your protection, @code{bzip2} uses 32-bit CRCs to
2378+make sure that the decompressed version of a file is identical to the
2379+original. This guards against corruption of the compressed data, and
2380+against undetected bugs in @code{bzip2} (hopefully very unlikely). The
2381+chances of data corruption going undetected is microscopic, about one
2382+chance in four billion for each file processed. Be aware, though, that
2383+the check occurs upon decompression, so it can only tell you that
2384+something is wrong. It can't help you recover the original uncompressed
2385+data. You can use @code{bzip2recover} to try to recover data from
2386+damaged files.
2387+
2388+Return values: 0 for a normal exit, 1 for environmental problems (file
2389+not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
2390+compressed file, 3 for an internal consistency error (eg, bug) which
2391+caused @code{bzip2} to panic.
2392+
2393+
2394+@unnumberedsubsubsec OPTIONS
2395+@table @code
2396+@item -c --stdout
2397+Compress or decompress to standard output.
2398+@item -d --decompress
2399+Force decompression. @code{bzip2}, @code{bunzip2} and @code{bzcat} are
2400+really the same program, and the decision about what actions to take is
2401+done on the basis of which name is used. This flag overrides that
2402+mechanism, and forces bzip2 to decompress.
2403+@item -z --compress
2404+The complement to @code{-d}: forces compression, regardless of the
2405+invokation name.
2406+@item -t --test
2407+Check integrity of the specified file(s), but don't decompress them.
2408+This really performs a trial decompression and throws away the result.
2409+@item -f --force
2410+Force overwrite of output files. Normally, @code{bzip2} will not overwrite
2411+existing output files. Also forces @code{bzip2} to break hard links
2412+to files, which it otherwise wouldn't do.
2413+@item -k --keep
2414+Keep (don't delete) input files during compression
2415+or decompression.
2416+@item -s --small
2417+Reduce memory usage, for compression, decompression and testing. Files
2418+are decompressed and tested using a modified algorithm which only
2419+requires 2.5 bytes per block byte. This means any file can be
2420+decompressed in 2300k of memory, albeit at about half the normal speed.
2421+
2422+During compression, @code{-s} selects a block size of 200k, which limits
2423+memory use to around the same figure, at the expense of your compression
2424+ratio. In short, if your machine is low on memory (8 megabytes or
2425+less), use -s for everything. See MEMORY MANAGEMENT below.
2426+@item -q --quiet
2427+Suppress non-essential warning messages. Messages pertaining to
2428+I/O errors and other critical events will not be suppressed.
2429+@item -v --verbose
2430+Verbose mode -- show the compression ratio for each file processed.
2431+Further @code{-v}'s increase the verbosity level, spewing out lots of
2432+information which is primarily of interest for diagnostic purposes.
2433+@item -L --license -V --version
2434+Display the software version, license terms and conditions.
2435+@item -1 to -9
2436+Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
2437+effect when decompressing. See MEMORY MANAGEMENT below.
2438+@item --
2439+Treats all subsequent arguments as file names, even if they start
2440+with a dash. This is so you can handle files with names beginning
2441+with a dash, for example: @code{bzip2 -- -myfilename}.
2442+@item --repetitive-fast
2443+@item --repetitive-best
2444+These flags are redundant in versions 0.9.5 and above. They provided
2445+some coarse control over the behaviour of the sorting algorithm in
2446+earlier versions, which was sometimes useful. 0.9.5 and above have an
2447+improved algorithm which renders these flags irrelevant.
2448+@end table
2449+
2450+
2451+@unnumberedsubsubsec MEMORY MANAGEMENT
2452+
2453+@code{bzip2} compresses large files in blocks. The block size affects
2454+both the compression ratio achieved, and the amount of memory needed for
2455+compression and decompression. The flags @code{-1} through @code{-9}
2456+specify the block size to be 100,000 bytes through 900,000 bytes (the
2457+default) respectively. At decompression time, the block size used for
2458+compression is read from the header of the compressed file, and
2459+@code{bunzip2} then allocates itself just enough memory to decompress
2460+the file. Since block sizes are stored in compressed files, it follows
2461+that the flags @code{-1} to @code{-9} are irrelevant to and so ignored
2462+during decompression.
2463+
2464+Compression and decompression requirements, in bytes, can be estimated
2465+as:
2466+@example
2467+ Compression: 400k + ( 8 x block size )
2468+
2469+ Decompression: 100k + ( 4 x block size ), or
2470+ 100k + ( 2.5 x block size )
2471+@end example
2472+Larger block sizes give rapidly diminishing marginal returns. Most of
2473+the compression comes from the first two or three hundred k of block
2474+size, a fact worth bearing in mind when using @code{bzip2} on small machines.
2475+It is also important to appreciate that the decompression memory
2476+requirement is set at compression time by the choice of block size.
2477+
2478+For files compressed with the default 900k block size, @code{bunzip2}
2479+will require about 3700 kbytes to decompress. To support decompression
2480+of any file on a 4 megabyte machine, @code{bunzip2} has an option to
2481+decompress using approximately half this amount of memory, about 2300
2482+kbytes. Decompression speed is also halved, so you should use this
2483+option only where necessary. The relevant flag is @code{-s}.
2484+
2485+In general, try and use the largest block size memory constraints allow,
2486+since that maximises the compression achieved. Compression and
2487+decompression speed are virtually unaffected by block size.
2488+
2489+Another significant point applies to files which fit in a single block
2490+-- that means most files you'd encounter using a large block size. The
2491+amount of real memory touched is proportional to the size of the file,
2492+since the file is smaller than a block. For example, compressing a file
2493+20,000 bytes long with the flag @code{-9} will cause the compressor to
2494+allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
2495+kbytes of it. Similarly, the decompressor will allocate 3700k but only
2496+touch 100k + 20000 * 4 = 180 kbytes.
2497+
2498+Here is a table which summarises the maximum memory usage for different
2499+block sizes. Also recorded is the total compressed size for 14 files of
2500+the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
2501+column gives some feel for how compression varies with block size.
2502+These figures tend to understate the advantage of larger block sizes for
2503+larger files, since the Corpus is dominated by smaller files.
2504+@example
2505+ Compress Decompress Decompress Corpus
2506+ Flag usage usage -s usage Size
2507+
2508+ -1 1200k 500k 350k 914704
2509+ -2 2000k 900k 600k 877703
2510+ -3 2800k 1300k 850k 860338
2511+ -4 3600k 1700k 1100k 846899
2512+ -5 4400k 2100k 1350k 845160
2513+ -6 5200k 2500k 1600k 838626
2514+ -7 6100k 2900k 1850k 834096
2515+ -8 6800k 3300k 2100k 828642
2516+ -9 7600k 3700k 2350k 828642
2517+@end example
2518+
2519+@unnumberedsubsubsec RECOVERING DATA FROM DAMAGED FILES
2520+
2521+@code{bzip2} compresses files in blocks, usually 900kbytes long. Each
2522+block is handled independently. If a media or transmission error causes
2523+a multi-block @code{.bz2} file to become damaged, it may be possible to
2524+recover data from the undamaged blocks in the file.
2525+
2526+The compressed representation of each block is delimited by a 48-bit
2527+pattern, which makes it possible to find the block boundaries with
2528+reasonable certainty. Each block also carries its own 32-bit CRC, so
2529+damaged blocks can be distinguished from undamaged ones.
2530+
2531+@code{bzip2recover} is a simple program whose purpose is to search for
2532+blocks in @code{.bz2} files, and write each block out into its own
2533+@code{.bz2} file. You can then use @code{bzip2 -t} to test the
2534+integrity of the resulting files, and decompress those which are
2535+undamaged.
2536+
2537+@code{bzip2recover}
2538+takes a single argument, the name of the damaged file,
2539+and writes a number of files @code{rec0001file.bz2},
2540+ @code{rec0002file.bz2}, etc, containing the extracted blocks.
2541+ The output filenames are designed so that the use of
2542+ wildcards in subsequent processing -- for example,
2543+@code{bzip2 -dc rec*file.bz2 > recovered_data} -- lists the files in
2544+ the correct order.
2545+
2546+@code{bzip2recover} should be of most use dealing with large @code{.bz2}
2547+ files, as these will contain many blocks. It is clearly
2548+ futile to use it on damaged single-block files, since a
2549+ damaged block cannot be recovered. If you wish to minimise
2550+any potential data loss through media or transmission errors,
2551+you might consider compressing with a smaller
2552+ block size.
2553+
2554+
2555+@unnumberedsubsubsec PERFORMANCE NOTES
2556+
2557+The sorting phase of compression gathers together similar strings in the
2558+file. Because of this, files containing very long runs of repeated
2559+symbols, like "aabaabaabaab ..." (repeated several hundred times) may
2560+compress more slowly than normal. Versions 0.9.5 and above fare much
2561+better than previous versions in this respect. The ratio between
2562+worst-case and average-case compression time is in the region of 10:1.
2563+For previous versions, this figure was more like 100:1. You can use the
2564+@code{-vvvv} option to monitor progress in great detail, if you want.
2565+
2566+Decompression speed is unaffected by these phenomena.
2567+
2568+@code{bzip2} usually allocates several megabytes of memory to operate
2569+in, and then charges all over it in a fairly random fashion. This means
2570+that performance, both for compressing and decompressing, is largely
2571+determined by the speed at which your machine can service cache misses.
2572+Because of this, small changes to the code to reduce the miss rate have
2573+been observed to give disproportionately large performance improvements.
2574+I imagine @code{bzip2} will perform best on machines with very large
2575+caches.
2576+
2577+
2578+@unnumberedsubsubsec CAVEATS
2579+
2580+I/O error messages are not as helpful as they could be. @code{bzip2}
2581+tries hard to detect I/O errors and exit cleanly, but the details of
2582+what the problem is sometimes seem rather misleading.
2583+
2584+This manual page pertains to version 1.0 of @code{bzip2}. Compressed
2585+data created by this version is entirely forwards and backwards
2586+compatible with the previous public releases, versions 0.1pl2, 0.9.0 and
2587+0.9.5, but with the following exception: 0.9.0 and above can correctly
2588+decompress multiple concatenated compressed files. 0.1pl2 cannot do
2589+this; it will stop after decompressing just the first file in the
2590+stream.
2591+
2592+@code{bzip2recover} uses 32-bit integers to represent bit positions in
2593+compressed files, so it cannot handle compressed files more than 512
2594+megabytes long. This could easily be fixed.
2595+
2596+
2597+@unnumberedsubsubsec AUTHOR
2598+Julian Seward, @code{jseward@@acm.org}.
2599+
2600+The ideas embodied in @code{bzip2} are due to (at least) the following
2601+people: Michael Burrows and David Wheeler (for the block sorting
2602+transformation), David Wheeler (again, for the Huffman coder), Peter
2603+Fenwick (for the structured coding model in the original @code{bzip},
2604+and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
2605+(for the arithmetic coder in the original @code{bzip}). I am much
2606+indebted for their help, support and advice. See the manual in the
2607+source distribution for pointers to sources of documentation. Christian
2608+von Roques encouraged me to look for faster sorting algorithms, so as to
2609+speed up compression. Bela Lubkin encouraged me to improve the
2610+worst-case compression performance. Many people sent patches, helped
2611+with portability problems, lent machines, gave advice and were generally
2612+helpful.
2613+
2614+@end quotation
2615+
2616+
2617+
2618+
2619+@chapter Programming with @code{libbzip2}
2620+
2621+This chapter describes the programming interface to @code{libbzip2}.
2622+
2623+For general background information, particularly about memory
2624+use and performance aspects, you'd be well advised to read Chapter 2
2625+as well.
2626+
2627+@section Top-level structure
2628+
2629+@code{libbzip2} is a flexible library for compressing and decompressing
2630+data in the @code{bzip2} data format. Although packaged as a single
2631+entity, it helps to regard the library as three separate parts: the low
2632+level interface, and the high level interface, and some utility
2633+functions.
2634+
2635+The structure of @code{libbzip2}'s interfaces is similar to
2636+that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib}
2637+library.
2638+
2639+All externally visible symbols have names beginning @code{BZ2_}.
2640+This is new in version 1.0. The intention is to minimise pollution
2641+of the namespaces of library clients.
2642+
2643+@subsection Low-level summary
2644+
2645+This interface provides services for compressing and decompressing
2646+data in memory. There's no provision for dealing with files, streams
2647+or any other I/O mechanisms, just straight memory-to-memory work.
2648+In fact, this part of the library can be compiled without inclusion
2649+of @code{stdio.h}, which may be helpful for embedded applications.
2650+
2651+The low-level part of the library has no global variables and
2652+is therefore thread-safe.
2653+
2654+Six routines make up the low level interface:
2655+@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, and @* @code{BZ2_bzCompressEnd}
2656+for compression,
2657+and a corresponding trio @code{BZ2_bzDecompressInit}, @* @code{BZ2_bzDecompress}
2658+and @code{BZ2_bzDecompressEnd} for decompression.
2659+The @code{*Init} functions allocate
2660+memory for compression/decompression and do other
2661+initialisations, whilst the @code{*End} functions close down operations
2662+and release memory.
2663+
2664+The real work is done by @code{BZ2_bzCompress} and @code{BZ2_bzDecompress}.
2665+These compress and decompress data from a user-supplied input buffer
2666+to a user-supplied output buffer. These buffers can be any size;
2667+arbitrary quantities of data are handled by making repeated calls
2668+to these functions. This is a flexible mechanism allowing a
2669+consumer-pull style of activity, or producer-push, or a mixture of
2670+both.
2671+
2672+
2673+
2674+@subsection High-level summary
2675+
2676+This interface provides some handy wrappers around the low-level
2677+interface to facilitate reading and writing @code{bzip2} format
2678+files (@code{.bz2} files). The routines provide hooks to facilitate
2679+reading files in which the @code{bzip2} data stream is embedded
2680+within some larger-scale file structure, or where there are
2681+multiple @code{bzip2} data streams concatenated end-to-end.
2682+
2683+For reading files, @code{BZ2_bzReadOpen}, @code{BZ2_bzRead},
2684+@code{BZ2_bzReadClose} and @* @code{BZ2_bzReadGetUnused} are supplied. For
2685+writing files, @code{BZ2_bzWriteOpen}, @code{BZ2_bzWrite} and
2686+@code{BZ2_bzWriteFinish} are available.
2687+
2688+As with the low-level library, no global variables are used
2689+so the library is per se thread-safe. However, if I/O errors
2690+occur whilst reading or writing the underlying compressed files,
2691+you may have to consult @code{errno} to determine the cause of
2692+the error. In that case, you'd need a C library which correctly
2693+supports @code{errno} in a multithreaded environment.
2694+
2695+To make the library a little simpler and more portable,
2696+@code{BZ2_bzReadOpen} and @code{BZ2_bzWriteOpen} require you to pass them file
2697+handles (@code{FILE*}s) which have previously been opened for reading or
2698+writing respectively. That avoids portability problems associated with
2699+file operations and file attributes, whilst not being much of an
2700+imposition on the programmer.
2701+
2702+
2703+
2704+@subsection Utility functions summary
2705+For very simple needs, @code{BZ2_bzBuffToBuffCompress} and
2706+@code{BZ2_bzBuffToBuffDecompress} are provided. These compress
2707+data in memory from one buffer to another buffer in a single
2708+function call. You should assess whether these functions
2709+fulfill your memory-to-memory compression/decompression
2710+requirements before investing effort in understanding the more
2711+general but more complex low-level interface.
2712+
2713+Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} /
2714+@code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to
2715+give better @code{zlib} compatibility. These functions are
2716+@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
2717+@code{BZ2_bzclose},
2718+@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}. You may find these functions
2719+more convenient for simple file reading and writing, than those in the
2720+high-level interface. These functions are not (yet) officially part of
2721+the library, and are minimally documented here. If they break, you
2722+get to keep all the pieces. I hope to document them properly when time
2723+permits.
2724+
2725+Yoshioka also contributed modifications to allow the library to be
2726+built as a Windows DLL.
2727+
2728+
2729+@section Error handling
2730+
2731+The library is designed to recover cleanly in all situations, including
2732+the worst-case situation of decompressing random data. I'm not
2733+100% sure that it can always do this, so you might want to add
2734+a signal handler to catch segmentation violations during decompression
2735+if you are feeling especially paranoid. I would be interested in
2736+hearing more about the robustness of the library to corrupted
2737+compressed data.
2738+
2739+Version 1.0 is much more robust in this respect than
2740+0.9.0 or 0.9.5. Investigations with Checker (a tool for
2741+detecting problems with memory management, similar to Purify)
2742+indicate that, at least for the few files I tested, all single-bit
2743+errors in the decompressed data are caught properly, with no
2744+segmentation faults, no reads of uninitialised data and no
2745+out of range reads or writes. So it's certainly much improved,
2746+although I wouldn't claim it to be totally bombproof.
2747+
2748+The file @code{bzlib.h} contains all definitions needed to use
2749+the library. In particular, you should definitely not include
2750+@code{bzlib_private.h}.
2751+
2752+In @code{bzlib.h}, the various return values are defined. The following
2753+list is not intended as an exhaustive description of the circumstances
2754+in which a given value may be returned -- those descriptions are given
2755+later. Rather, it is intended to convey the rough meaning of each
2756+return value. The first five actions are normal and not intended to
2757+denote an error situation.
2758+@table @code
2759+@item BZ_OK
2760+The requested action was completed successfully.
2761+@item BZ_RUN_OK
2762+@itemx BZ_FLUSH_OK
2763+@itemx BZ_FINISH_OK
2764+In @code{BZ2_bzCompress}, the requested flush/finish/nothing-special action
2765+was completed successfully.
2766+@item BZ_STREAM_END
2767+Compression of data was completed, or the logical stream end was
2768+detected during decompression.
2769+@end table
2770+
2771+The following return values indicate an error of some kind.
2772+@table @code
2773+@item BZ_CONFIG_ERROR
2774+Indicates that the library has been improperly compiled on your
2775+platform -- a major configuration error. Specifically, it means
2776+that @code{sizeof(char)}, @code{sizeof(short)} and @code{sizeof(int)}
2777+are not 1, 2 and 4 respectively, as they should be. Note that the
2778+library should still work properly on 64-bit platforms which follow
2779+the LP64 programming model -- that is, where @code{sizeof(long)}
2780+and @code{sizeof(void*)} are 8. Under LP64, @code{sizeof(int)} is
2781+still 4, so @code{libbzip2}, which doesn't use the @code{long} type,
2782+is OK.
2783+@item BZ_SEQUENCE_ERROR
2784+When using the library, it is important to call the functions in the
2785+correct sequence and with data structures (buffers etc) in the correct
2786+states. @code{libbzip2} checks as much as it can to ensure this is
2787+happening, and returns @code{BZ_SEQUENCE_ERROR} if not. Code which
2788+complies precisely with the function semantics, as detailed below,
2789+should never receive this value; such an event denotes buggy code
2790+which you should investigate.
2791+@item BZ_PARAM_ERROR
2792+Returned when a parameter to a function call is out of range
2793+or otherwise manifestly incorrect. As with @code{BZ_SEQUENCE_ERROR},
2794+this denotes a bug in the client code. The distinction between
2795+@code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth
2796+making.
2797+@item BZ_MEM_ERROR
2798+Returned when a request to allocate memory failed. Note that the
2799+quantity of memory needed to decompress a stream cannot be determined
2800+until the stream's header has been read. So @code{BZ2_bzDecompress} and
2801+@code{BZ2_bzRead} may return @code{BZ_MEM_ERROR} even though some of
2802+the compressed data has been read. The same is not true for
2803+compression; once @code{BZ2_bzCompressInit} or @code{BZ2_bzWriteOpen} have
2804+successfully completed, @code{BZ_MEM_ERROR} cannot occur.
2805+@item BZ_DATA_ERROR
2806+Returned when a data integrity error is detected during decompression.
2807+Most importantly, this means when stored and computed CRCs for the
2808+data do not match. This value is also returned upon detection of any
2809+other anomaly in the compressed data.
2810+@item BZ_DATA_ERROR_MAGIC
2811+As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to
2812+know when the compressed stream does not start with the correct
2813+magic bytes (@code{'B' 'Z' 'h'}).
2814+@item BZ_IO_ERROR
2815+Returned by @code{BZ2_bzRead} and @code{BZ2_bzWrite} when there is an error
2816+reading or writing in the compressed file, and by @code{BZ2_bzReadOpen}
2817+and @code{BZ2_bzWriteOpen} for attempts to use a file for which the
2818+error indicator (viz, @code{ferror(f)}) is set.
2819+On receipt of @code{BZ_IO_ERROR}, the caller should consult
2820+@code{errno} and/or @code{perror} to acquire operating-system
2821+specific information about the problem.
2822+@item BZ_UNEXPECTED_EOF
2823+Returned by @code{BZ2_bzRead} when the compressed file finishes
2824+before the logical end of stream is detected.
2825+@item BZ_OUTBUFF_FULL
2826+Returned by @code{BZ2_bzBuffToBuffCompress} and
2827+@code{BZ2_bzBuffToBuffDecompress} to indicate that the output data
2828+will not fit into the output buffer provided.
2829+@end table
2830+
2831+
2832+
2833+@section Low-level interface
2834+
2835+@subsection @code{BZ2_bzCompressInit}
2836+@example
2837+typedef
2838+ struct @{
2839+ char *next_in;
2840+ unsigned int avail_in;
2841+ unsigned int total_in_lo32;
2842+ unsigned int total_in_hi32;
2843+
2844+ char *next_out;
2845+ unsigned int avail_out;
2846+ unsigned int total_out_lo32;
2847+ unsigned int total_out_hi32;
2848+
2849+ void *state;
2850+
2851+ void *(*bzalloc)(void *,int,int);
2852+ void (*bzfree)(void *,void *);
2853+ void *opaque;
2854+ @}
2855+ bz_stream;
2856+
2857+int BZ2_bzCompressInit ( bz_stream *strm,
2858+ int blockSize100k,
2859+ int verbosity,
2860+ int workFactor );
2861+
2862+@end example
2863+
2864+Prepares for compression. The @code{bz_stream} structure
2865+holds all data pertaining to the compression activity.
2866+A @code{bz_stream} structure should be allocated and initialised
2867+prior to the call.
2868+The fields of @code{bz_stream}
2869+comprise the entirety of the user-visible data. @code{state}
2870+is a pointer to the private data structures required for compression.
2871+
2872+Custom memory allocators are supported, via fields @code{bzalloc},
2873+@code{bzfree},
2874+and @code{opaque}. The value
2875+@code{opaque} is passed to as the first argument to
2876+all calls to @code{bzalloc} and @code{bzfree}, but is
2877+otherwise ignored by the library.
2878+The call @code{bzalloc ( opaque, n, m )} is expected to return a
2879+pointer @code{p} to
2880+@code{n * m} bytes of memory, and @code{bzfree ( opaque, p )}
2881+should free
2882+that memory.
2883+
2884+If you don't want to use a custom memory allocator, set @code{bzalloc},
2885+@code{bzfree} and
2886+@code{opaque} to @code{NULL},
2887+and the library will then use the standard @code{malloc}/@code{free}
2888+routines.
2889+
2890+Before calling @code{BZ2_bzCompressInit}, fields @code{bzalloc},
2891+@code{bzfree} and @code{opaque} should
2892+be filled appropriately, as just described. Upon return, the internal
2893+state will have been allocated and initialised, and @code{total_in_lo32},
2894+@code{total_in_hi32}, @code{total_out_lo32} and
2895+@code{total_out_hi32} will have been set to zero.
2896+These four fields are used by the library
2897+to inform the caller of the total amount of data passed into and out of
2898+the library, respectively. You should not try to change them.
2899+As of version 1.0, 64-bit counts are maintained, even on 32-bit
2900+platforms, using the @code{_hi32} fields to store the upper 32 bits
2901+of the count. So, for example, the total amount of data in
2902+is @code{(total_in_hi32 << 32) + total_in_lo32}.
2903+
2904+Parameter @code{blockSize100k} specifies the block size to be used for
2905+compression. It should be a value between 1 and 9 inclusive, and the
2906+actual block size used is 100000 x this figure. 9 gives the best
2907+compression but takes most memory.
2908+
2909+Parameter @code{verbosity} should be set to a number between 0 and 4
2910+inclusive. 0 is silent, and greater numbers give increasingly verbose
2911+monitoring/debugging output. If the library has been compiled with
2912+@code{-DBZ_NO_STDIO}, no such output will appear for any verbosity
2913+setting.
2914+
2915+Parameter @code{workFactor} controls how the compression phase behaves
2916+when presented with worst case, highly repetitive, input data. If
2917+compression runs into difficulties caused by repetitive data, the
2918+library switches from the standard sorting algorithm to a fallback
2919+algorithm. The fallback is slower than the standard algorithm by
2920+perhaps a factor of three, but always behaves reasonably, no matter how
2921+bad the input.
2922+
2923+Lower values of @code{workFactor} reduce the amount of effort the
2924+standard algorithm will expend before resorting to the fallback. You
2925+should set this parameter carefully; too low, and many inputs will be
2926+handled by the fallback algorithm and so compress rather slowly, too
2927+high, and your average-to-worst case compression times can become very
2928+large. The default value of 30 gives reasonable behaviour over a wide
2929+range of circumstances.
2930+
2931+Allowable values range from 0 to 250 inclusive. 0 is a special case,
2932+equivalent to using the default value of 30.
2933+
2934+Note that the compressed output generated is the same regardless of
2935+whether or not the fallback algorithm is used.
2936+
2937+Be aware also that this parameter may disappear entirely in future
2938+versions of the library. In principle it should be possible to devise a
2939+good way to automatically choose which algorithm to use. Such a
2940+mechanism would render the parameter obsolete.
2941+
2942+Possible return values:
2943+@display
2944+ @code{BZ_CONFIG_ERROR}
2945+ if the library has been mis-compiled
2946+ @code{BZ_PARAM_ERROR}
2947+ if @code{strm} is @code{NULL}
2948+ or @code{blockSize} < 1 or @code{blockSize} > 9
2949+ or @code{verbosity} < 0 or @code{verbosity} > 4
2950+ or @code{workFactor} < 0 or @code{workFactor} > 250
2951+ @code{BZ_MEM_ERROR}
2952+ if not enough memory is available
2953+ @code{BZ_OK}
2954+ otherwise
2955+@end display
2956+Allowable next actions:
2957+@display
2958+ @code{BZ2_bzCompress}
2959+ if @code{BZ_OK} is returned
2960+ no specific action needed in case of error
2961+@end display
2962+
2963+@subsection @code{BZ2_bzCompress}
2964+@example
2965+ int BZ2_bzCompress ( bz_stream *strm, int action );
2966+@end example
2967+Provides more input and/or output buffer space for the library. The
2968+caller maintains input and output buffers, and calls @code{BZ2_bzCompress} to
2969+transfer data between them.
2970+
2971+Before each call to @code{BZ2_bzCompress}, @code{next_in} should point at
2972+the data to be compressed, and @code{avail_in} should indicate how many
2973+bytes the library may read. @code{BZ2_bzCompress} updates @code{next_in},
2974+@code{avail_in} and @code{total_in} to reflect the number of bytes it
2975+has read.
2976+
2977+Similarly, @code{next_out} should point to a buffer in which the
2978+compressed data is to be placed, with @code{avail_out} indicating how
2979+much output space is available. @code{BZ2_bzCompress} updates
2980+@code{next_out}, @code{avail_out} and @code{total_out} to reflect the
2981+number of bytes output.
2982+
2983+You may provide and remove as little or as much data as you like on each
2984+call of @code{BZ2_bzCompress}. In the limit, it is acceptable to supply and
2985+remove data one byte at a time, although this would be terribly
2986+inefficient. You should always ensure that at least one byte of output
2987+space is available at each call.
2988+
2989+A second purpose of @code{BZ2_bzCompress} is to request a change of mode of the
2990+compressed stream.
2991+
2992+Conceptually, a compressed stream can be in one of four states: IDLE,
2993+RUNNING, FLUSHING and FINISHING. Before initialisation
2994+(@code{BZ2_bzCompressInit}) and after termination (@code{BZ2_bzCompressEnd}), a
2995+stream is regarded as IDLE.
2996+
2997+Upon initialisation (@code{BZ2_bzCompressInit}), the stream is placed in the
2998+RUNNING state. Subsequent calls to @code{BZ2_bzCompress} should pass
2999+@code{BZ_RUN} as the requested action; other actions are illegal and
3000+will result in @code{BZ_SEQUENCE_ERROR}.
3001+
3002+At some point, the calling program will have provided all the input data
3003+it wants to. It will then want to finish up -- in effect, asking the
3004+library to process any data it might have buffered internally. In this
3005+state, @code{BZ2_bzCompress} will no longer attempt to read data from
3006+@code{next_in}, but it will want to write data to @code{next_out}.
3007+Because the output buffer supplied by the user can be arbitrarily small,
3008+the finishing-up operation cannot necessarily be done with a single call
3009+of @code{BZ2_bzCompress}.
3010+
3011+Instead, the calling program passes @code{BZ_FINISH} as an action to
3012+@code{BZ2_bzCompress}. This changes the stream's state to FINISHING. Any
3013+remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and
3014+transferred to the output buffer. To do this, @code{BZ2_bzCompress} must be
3015+called repeatedly until all the output has been consumed. At that
3016+point, @code{BZ2_bzCompress} returns @code{BZ_STREAM_END}, and the stream's
3017+state is set back to IDLE. @code{BZ2_bzCompressEnd} should then be
3018+called.
3019+
3020+Just to make sure the calling program does not cheat, the library makes
3021+a note of @code{avail_in} at the time of the first call to
3022+@code{BZ2_bzCompress} which has @code{BZ_FINISH} as an action (ie, at the
3023+time the program has announced its intention to not supply any more
3024+input). By comparing this value with that of @code{avail_in} over
3025+subsequent calls to @code{BZ2_bzCompress}, the library can detect any
3026+attempts to slip in more data to compress. Any calls for which this is
3027+detected will return @code{BZ_SEQUENCE_ERROR}. This indicates a
3028+programming mistake which should be corrected.
3029+
3030+Instead of asking to finish, the calling program may ask
3031+@code{BZ2_bzCompress} to take all the remaining input, compress it and
3032+terminate the current (Burrows-Wheeler) compression block. This could
3033+be useful for error control purposes. The mechanism is analogous to
3034+that for finishing: call @code{BZ2_bzCompress} with an action of
3035+@code{BZ_FLUSH}, remove output data, and persist with the
3036+@code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned. As
3037+with finishing, @code{BZ2_bzCompress} detects any attempt to provide more
3038+input data once the flush has begun.
3039+
3040+Once the flush is complete, the stream returns to the normal RUNNING
3041+state.
3042+
3043+This all sounds pretty complex, but isn't really. Here's a table
3044+which shows which actions are allowable in each state, what action
3045+will be taken, what the next state is, and what the non-error return
3046+values are. Note that you can't explicitly ask what state the
3047+stream is in, but nor do you need to -- it can be inferred from the
3048+values returned by @code{BZ2_bzCompress}.
3049+@display
3050+IDLE/@code{any}
3051+ Illegal. IDLE state only exists after @code{BZ2_bzCompressEnd} or
3052+ before @code{BZ2_bzCompressInit}.
3053+ Return value = @code{BZ_SEQUENCE_ERROR}
3054+
3055+RUNNING/@code{BZ_RUN}
3056+ Compress from @code{next_in} to @code{next_out} as much as possible.
3057+ Next state = RUNNING
3058+ Return value = @code{BZ_RUN_OK}
3059+
3060+RUNNING/@code{BZ_FLUSH}
3061+ Remember current value of @code{next_in}. Compress from @code{next_in}
3062+ to @code{next_out} as much as possible, but do not accept any more input.
3063+ Next state = FLUSHING
3064+ Return value = @code{BZ_FLUSH_OK}
3065+
3066+RUNNING/@code{BZ_FINISH}
3067+ Remember current value of @code{next_in}. Compress from @code{next_in}
3068+ to @code{next_out} as much as possible, but do not accept any more input.
3069+ Next state = FINISHING
3070+ Return value = @code{BZ_FINISH_OK}
3071+
3072+FLUSHING/@code{BZ_FLUSH}
3073+ Compress from @code{next_in} to @code{next_out} as much as possible,
3074+ but do not accept any more input.
3075+ If all the existing input has been used up and all compressed
3076+ output has been removed
3077+ Next state = RUNNING; Return value = @code{BZ_RUN_OK}
3078+ else
3079+ Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK}
3080+
3081+FLUSHING/other
3082+ Illegal.
3083+ Return value = @code{BZ_SEQUENCE_ERROR}
3084+
3085+FINISHING/@code{BZ_FINISH}
3086+ Compress from @code{next_in} to @code{next_out} as much as possible,
3087+ but to not accept any more input.
3088+ If all the existing input has been used up and all compressed
3089+ output has been removed
3090+ Next state = IDLE; Return value = @code{BZ_STREAM_END}
3091+ else
3092+ Next state = FINISHING; Return value = @code{BZ_FINISHING}
3093+
3094+FINISHING/other
3095+ Illegal.
3096+ Return value = @code{BZ_SEQUENCE_ERROR}
3097+@end display
3098+
3099+That still looks complicated? Well, fair enough. The usual sequence
3100+of calls for compressing a load of data is:
3101+@itemize @bullet
3102+@item Get started with @code{BZ2_bzCompressInit}.
3103+@item Shovel data in and shlurp out its compressed form using zero or more
3104+calls of @code{BZ2_bzCompress} with action = @code{BZ_RUN}.
3105+@item Finish up.
3106+Repeatedly call @code{BZ2_bzCompress} with action = @code{BZ_FINISH},
3107+copying out the compressed output, until @code{BZ_STREAM_END} is returned.
3108+@item Close up and go home. Call @code{BZ2_bzCompressEnd}.
3109+@end itemize
3110+If the data you want to compress fits into your input buffer all
3111+at once, you can skip the calls of @code{BZ2_bzCompress ( ..., BZ_RUN )} and
3112+just do the @code{BZ2_bzCompress ( ..., BZ_FINISH )} calls.
3113+
3114+All required memory is allocated by @code{BZ2_bzCompressInit}. The
3115+compression library can accept any data at all (obviously). So you
3116+shouldn't get any error return values from the @code{BZ2_bzCompress} calls.
3117+If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in
3118+your programming.
3119+
3120+Trivial other possible return values:
3121+@display
3122+ @code{BZ_PARAM_ERROR}
3123+ if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL}
3124+@end display
3125+
3126+@subsection @code{BZ2_bzCompressEnd}
3127+@example
3128+int BZ2_bzCompressEnd ( bz_stream *strm );
3129+@end example
3130+Releases all memory associated with a compression stream.
3131+
3132+Possible return values:
3133+@display
3134+ @code{BZ_PARAM_ERROR} if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
3135+ @code{BZ_OK} otherwise
3136+@end display
3137+
3138+
3139+@subsection @code{BZ2_bzDecompressInit}
3140+@example
3141+int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
3142+@end example
3143+Prepares for decompression. As with @code{BZ2_bzCompressInit}, a
3144+@code{bz_stream} record should be allocated and initialised before the
3145+call. Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be
3146+set if a custom memory allocator is required, or made @code{NULL} for
3147+the normal @code{malloc}/@code{free} routines. Upon return, the internal
3148+state will have been initialised, and @code{total_in} and
3149+@code{total_out} will be zero.
3150+
3151+For the meaning of parameter @code{verbosity}, see @code{BZ2_bzCompressInit}.
3152+
3153+If @code{small} is nonzero, the library will use an alternative
3154+decompression algorithm which uses less memory but at the cost of
3155+decompressing more slowly (roughly speaking, half the speed, but the
3156+maximum memory requirement drops to around 2300k). See Chapter 2 for
3157+more information on memory management.
3158+
3159+Note that the amount of memory needed to decompress
3160+a stream cannot be determined until the stream's header has been read,
3161+so even if @code{BZ2_bzDecompressInit} succeeds, a subsequent
3162+@code{BZ2_bzDecompress} could fail with @code{BZ_MEM_ERROR}.
3163+
3164+Possible return values:
3165+@display
3166+ @code{BZ_CONFIG_ERROR}
3167+ if the library has been mis-compiled
3168+ @code{BZ_PARAM_ERROR}
3169+ if @code{(small != 0 && small != 1)}
3170+ or @code{(verbosity < 0 || verbosity > 4)}
3171+ @code{BZ_MEM_ERROR}
3172+ if insufficient memory is available
3173+@end display
3174+
3175+Allowable next actions:
3176+@display
3177+ @code{BZ2_bzDecompress}
3178+ if @code{BZ_OK} was returned
3179+ no specific action required in case of error
3180+@end display
3181+
3182+
3183+
3184+@subsection @code{BZ2_bzDecompress}
3185+@example
3186+int BZ2_bzDecompress ( bz_stream *strm );
3187+@end example
3188+Provides more input and/out output buffer space for the library. The
3189+caller maintains input and output buffers, and uses @code{BZ2_bzDecompress}
3190+to transfer data between them.
3191+
3192+Before each call to @code{BZ2_bzDecompress}, @code{next_in}
3193+should point at the compressed data,
3194+and @code{avail_in} should indicate how many bytes the library
3195+may read. @code{BZ2_bzDecompress} updates @code{next_in}, @code{avail_in}
3196+and @code{total_in}
3197+to reflect the number of bytes it has read.
3198+
3199+Similarly, @code{next_out} should point to a buffer in which the uncompressed
3200+output is to be placed, with @code{avail_out} indicating how much output space
3201+is available. @code{BZ2_bzCompress} updates @code{next_out},
3202+@code{avail_out} and @code{total_out} to reflect
3203+the number of bytes output.
3204+
3205+You may provide and remove as little or as much data as you like on
3206+each call of @code{BZ2_bzDecompress}.
3207+In the limit, it is acceptable to
3208+supply and remove data one byte at a time, although this would be
3209+terribly inefficient. You should always ensure that at least one
3210+byte of output space is available at each call.
3211+
3212+Use of @code{BZ2_bzDecompress} is simpler than @code{BZ2_bzCompress}.
3213+
3214+You should provide input and remove output as described above, and
3215+repeatedly call @code{BZ2_bzDecompress} until @code{BZ_STREAM_END} is
3216+returned. Appearance of @code{BZ_STREAM_END} denotes that
3217+@code{BZ2_bzDecompress} has detected the logical end of the compressed
3218+stream. @code{BZ2_bzDecompress} will not produce @code{BZ_STREAM_END} until
3219+all output data has been placed into the output buffer, so once
3220+@code{BZ_STREAM_END} appears, you are guaranteed to have available all
3221+the decompressed output, and @code{BZ2_bzDecompressEnd} can safely be
3222+called.
3223+
3224+If case of an error return value, you should call @code{BZ2_bzDecompressEnd}
3225+to clean up and release memory.
3226+
3227+Possible return values:
3228+@display
3229+ @code{BZ_PARAM_ERROR}
3230+ if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
3231+ or @code{strm->avail_out < 1}
3232+ @code{BZ_DATA_ERROR}
3233+ if a data integrity error is detected in the compressed stream
3234+ @code{BZ_DATA_ERROR_MAGIC}
3235+ if the compressed stream doesn't begin with the right magic bytes
3236+ @code{BZ_MEM_ERROR}
3237+ if there wasn't enough memory available
3238+ @code{BZ_STREAM_END}
3239+ if the logical end of the data stream was detected and all
3240+ output in has been consumed, eg @code{s->avail_out > 0}
3241+ @code{BZ_OK}
3242+ otherwise
3243+@end display
3244+Allowable next actions:
3245+@display
3246+ @code{BZ2_bzDecompress}
3247+ if @code{BZ_OK} was returned
3248+ @code{BZ2_bzDecompressEnd}
3249+ otherwise
3250+@end display
3251+
3252+
3253+@subsection @code{BZ2_bzDecompressEnd}
3254+@example
3255+int BZ2_bzDecompressEnd ( bz_stream *strm );
3256+@end example
3257+Releases all memory associated with a decompression stream.
3258+
3259+Possible return values:
3260+@display
3261+ @code{BZ_PARAM_ERROR}
3262+ if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
3263+ @code{BZ_OK}
3264+ otherwise
3265+@end display
3266+
3267+Allowable next actions:
3268+@display
3269+ None.
3270+@end display
3271+
3272+
3273+@section High-level interface
3274+
3275+This interface provides functions for reading and writing
3276+@code{bzip2} format files. First, some general points.
3277+
3278+@itemize @bullet
3279+@item All of the functions take an @code{int*} first argument,
3280+ @code{bzerror}.
3281+ After each call, @code{bzerror} should be consulted first to determine
3282+ the outcome of the call. If @code{bzerror} is @code{BZ_OK},
3283+ the call completed
3284+ successfully, and only then should the return value of the function
3285+ (if any) be consulted. If @code{bzerror} is @code{BZ_IO_ERROR},
3286+ there was an error
3287+ reading/writing the underlying compressed file, and you should
3288+ then consult @code{errno}/@code{perror} to determine the
3289+ cause of the difficulty.
3290+ @code{bzerror} may also be set to various other values; precise details are
3291+ given on a per-function basis below.
3292+@item If @code{bzerror} indicates an error
3293+ (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}),
3294+ you should immediately call @code{BZ2_bzReadClose} (or @code{BZ2_bzWriteClose},
3295+ depending on whether you are attempting to read or to write)
3296+ to free up all resources associated
3297+ with the stream. Once an error has been indicated, behaviour of all calls
3298+ except @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) is undefined.
3299+ The implication is that (1) @code{bzerror} should
3300+ be checked after each call, and (2) if @code{bzerror} indicates an error,
3301+ @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) should then be called to clean up.
3302+@item The @code{FILE*} arguments passed to
3303+ @code{BZ2_bzReadOpen}/@code{BZ2_bzWriteOpen}
3304+ should be set to binary mode.
3305+ Most Unix systems will do this by default, but other platforms,
3306+ including Windows and Mac, will not. If you omit this, you may
3307+ encounter problems when moving code to new platforms.
3308+@item Memory allocation requests are handled by
3309+ @code{malloc}/@code{free}.
3310+ At present
3311+ there is no facility for user-defined memory allocators in the file I/O
3312+ functions (could easily be added, though).
3313+@end itemize
3314+
3315+
3316+
3317+@subsection @code{BZ2_bzReadOpen}
3318+@example
3319+ typedef void BZFILE;
3320+
3321+ BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f,
3322+ int small, int verbosity,
3323+ void *unused, int nUnused );
3324+@end example
3325+Prepare to read compressed data from file handle @code{f}. @code{f}
3326+should refer to a file which has been opened for reading, and for which
3327+the error indicator (@code{ferror(f)})is not set. If @code{small} is 1,
3328+the library will try to decompress using less memory, at the expense of
3329+speed.
3330+
3331+For reasons explained below, @code{BZ2_bzRead} will decompress the
3332+@code{nUnused} bytes starting at @code{unused}, before starting to read
3333+from the file @code{f}. At most @code{BZ_MAX_UNUSED} bytes may be
3334+supplied like this. If this facility is not required, you should pass
3335+@code{NULL} and @code{0} for @code{unused} and n@code{Unused}
3336+respectively.
3337+
3338+For the meaning of parameters @code{small} and @code{verbosity},
3339+see @code{BZ2_bzDecompressInit}.
3340+
3341+The amount of memory needed to decompress a file cannot be determined
3342+until the file's header has been read. So it is possible that
3343+@code{BZ2_bzReadOpen} returns @code{BZ_OK} but a subsequent call of
3344+@code{BZ2_bzRead} will return @code{BZ_MEM_ERROR}.
3345+
3346+Possible assignments to @code{bzerror}:
3347+@display
3348+ @code{BZ_CONFIG_ERROR}
3349+ if the library has been mis-compiled
3350+ @code{BZ_PARAM_ERROR}
3351+ if @code{f} is @code{NULL}
3352+ or @code{small} is neither @code{0} nor @code{1}
3353+ or @code{(unused == NULL && nUnused != 0)}
3354+ or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))}
3355+ @code{BZ_IO_ERROR}
3356+ if @code{ferror(f)} is nonzero
3357+ @code{BZ_MEM_ERROR}
3358+ if insufficient memory is available
3359+ @code{BZ_OK}
3360+ otherwise.
3361+@end display
3362+
3363+Possible return values:
3364+@display
3365+ Pointer to an abstract @code{BZFILE}
3366+ if @code{bzerror} is @code{BZ_OK}
3367+ @code{NULL}
3368+ otherwise
3369+@end display
3370+
3371+Allowable next actions:
3372+@display
3373+ @code{BZ2_bzRead}
3374+ if @code{bzerror} is @code{BZ_OK}
3375+ @code{BZ2_bzClose}
3376+ otherwise
3377+@end display
3378+
3379+
3380+@subsection @code{BZ2_bzRead}
3381+@example
3382+ int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
3383+@end example
3384+Reads up to @code{len} (uncompressed) bytes from the compressed file
3385+@code{b} into
3386+the buffer @code{buf}. If the read was successful,
3387+@code{bzerror} is set to @code{BZ_OK}
3388+and the number of bytes read is returned. If the logical end-of-stream
3389+was detected, @code{bzerror} will be set to @code{BZ_STREAM_END},
3390+and the number
3391+of bytes read is returned. All other @code{bzerror} values denote an error.
3392+
3393+@code{BZ2_bzRead} will supply @code{len} bytes,
3394+unless the logical stream end is detected
3395+or an error occurs. Because of this, it is possible to detect the
3396+stream end by observing when the number of bytes returned is
3397+less than the number
3398+requested. Nevertheless, this is regarded as inadvisable; you should
3399+instead check @code{bzerror} after every call and watch out for
3400+@code{BZ_STREAM_END}.
3401+
3402+Internally, @code{BZ2_bzRead} copies data from the compressed file in chunks
3403+of size @code{BZ_MAX_UNUSED} bytes
3404+before decompressing it. If the file contains more bytes than strictly
3405+needed to reach the logical end-of-stream, @code{BZ2_bzRead} will almost certainly
3406+read some of the trailing data before signalling @code{BZ_SEQUENCE_END}.
3407+To collect the read but unused data once @code{BZ_SEQUENCE_END} has
3408+appeared, call @code{BZ2_bzReadGetUnused} immediately before @code{BZ2_bzReadClose}.
3409+
3410+Possible assignments to @code{bzerror}:
3411+@display
3412+ @code{BZ_PARAM_ERROR}
3413+ if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
3414+ @code{BZ_SEQUENCE_ERROR}
3415+ if @code{b} was opened with @code{BZ2_bzWriteOpen}
3416+ @code{BZ_IO_ERROR}
3417+ if there is an error reading from the compressed file
3418+ @code{BZ_UNEXPECTED_EOF}
3419+ if the compressed file ended before the logical end-of-stream was detected
3420+ @code{BZ_DATA_ERROR}
3421+ if a data integrity error was detected in the compressed stream
3422+ @code{BZ_DATA_ERROR_MAGIC}
3423+ if the stream does not begin with the requisite header bytes (ie, is not
3424+ a @code{bzip2} data file). This is really a special case of @code{BZ_DATA_ERROR}.
3425+ @code{BZ_MEM_ERROR}
3426+ if insufficient memory was available
3427+ @code{BZ_STREAM_END}
3428+ if the logical end of stream was detected.
3429+ @code{BZ_OK}
3430+ otherwise.
3431+@end display
3432+
3433+Possible return values:
3434+@display
3435+ number of bytes read
3436+ if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END}
3437+ undefined
3438+ otherwise
3439+@end display
3440+
3441+Allowable next actions:
3442+@display
3443+ collect data from @code{buf}, then @code{BZ2_bzRead} or @code{BZ2_bzReadClose}
3444+ if @code{bzerror} is @code{BZ_OK}
3445+ collect data from @code{buf}, then @code{BZ2_bzReadClose} or @code{BZ2_bzReadGetUnused}
3446+ if @code{bzerror} is @code{BZ_SEQUENCE_END}
3447+ @code{BZ2_bzReadClose}
3448+ otherwise
3449+@end display
3450+
3451+
3452+
3453+@subsection @code{BZ2_bzReadGetUnused}
3454+@example
3455+ void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b,
3456+ void** unused, int* nUnused );
3457+@end example
3458+Returns data which was read from the compressed file but was not needed
3459+to get to the logical end-of-stream. @code{*unused} is set to the address
3460+of the data, and @code{*nUnused} to the number of bytes. @code{*nUnused} will
3461+be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive.
3462+
3463+This function may only be called once @code{BZ2_bzRead} has signalled
3464+@code{BZ_STREAM_END} but before @code{BZ2_bzReadClose}.
3465+
3466+Possible assignments to @code{bzerror}:
3467+@display
3468+ @code{BZ_PARAM_ERROR}
3469+ if @code{b} is @code{NULL}
3470+ or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL}
3471+ @code{BZ_SEQUENCE_ERROR}
3472+ if @code{BZ_STREAM_END} has not been signalled
3473+ or if @code{b} was opened with @code{BZ2_bzWriteOpen}
3474+ @code{BZ_OK}
3475+ otherwise
3476+@end display
3477+
3478+Allowable next actions:
3479+@display
3480+ @code{BZ2_bzReadClose}
3481+@end display
3482+
3483+
3484+@subsection @code{BZ2_bzReadClose}
3485+@example
3486+ void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
3487+@end example
3488+Releases all memory pertaining to the compressed file @code{b}.
3489+@code{BZ2_bzReadClose} does not call @code{fclose} on the underlying file
3490+handle, so you should do that yourself if appropriate.
3491+@code{BZ2_bzReadClose} should be called to clean up after all error
3492+situations.
3493+
3494+Possible assignments to @code{bzerror}:
3495+@display
3496+ @code{BZ_SEQUENCE_ERROR}
3497+ if @code{b} was opened with @code{BZ2_bzOpenWrite}
3498+ @code{BZ_OK}
3499+ otherwise
3500+@end display
3501+
3502+Allowable next actions:
3503+@display
3504+ none
3505+@end display
3506+
3507+
3508+
3509+@subsection @code{BZ2_bzWriteOpen}
3510+@example
3511+ BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f,
3512+ int blockSize100k, int verbosity,
3513+ int workFactor );
3514+@end example
3515+Prepare to write compressed data to file handle @code{f}.
3516+@code{f} should refer to
3517+a file which has been opened for writing, and for which the error
3518+indicator (@code{ferror(f)})is not set.
3519+
3520+For the meaning of parameters @code{blockSize100k},
3521+@code{verbosity} and @code{workFactor}, see
3522+@* @code{BZ2_bzCompressInit}.
3523+
3524+All required memory is allocated at this stage, so if the call
3525+completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a
3526+subsequent call to @code{BZ2_bzWrite}.
3527+
3528+Possible assignments to @code{bzerror}:
3529+@display
3530+ @code{BZ_CONFIG_ERROR}
3531+ if the library has been mis-compiled
3532+ @code{BZ_PARAM_ERROR}
3533+ if @code{f} is @code{NULL}
3534+ or @code{blockSize100k < 1} or @code{blockSize100k > 9}
3535+ @code{BZ_IO_ERROR}
3536+ if @code{ferror(f)} is nonzero
3537+ @code{BZ_MEM_ERROR}
3538+ if insufficient memory is available
3539+ @code{BZ_OK}
3540+ otherwise
3541+@end display
3542+
3543+Possible return values:
3544+@display
3545+ Pointer to an abstract @code{BZFILE}
3546+ if @code{bzerror} is @code{BZ_OK}
3547+ @code{NULL}
3548+ otherwise
3549+@end display
3550+
3551+Allowable next actions:
3552+@display
3553+ @code{BZ2_bzWrite}
3554+ if @code{bzerror} is @code{BZ_OK}
3555+ (you could go directly to @code{BZ2_bzWriteClose}, but this would be pretty pointless)
3556+ @code{BZ2_bzWriteClose}
3557+ otherwise
3558+@end display
3559+
3560+
3561+
3562+@subsection @code{BZ2_bzWrite}
3563+@example
3564+ void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
3565+@end example
3566+Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be
3567+compressed and written to the file.
3568+
3569+Possible assignments to @code{bzerror}:
3570+@display
3571+ @code{BZ_PARAM_ERROR}
3572+ if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
3573+ @code{BZ_SEQUENCE_ERROR}
3574+ if b was opened with @code{BZ2_bzReadOpen}
3575+ @code{BZ_IO_ERROR}
3576+ if there is an error writing the compressed file.
3577+ @code{BZ_OK}
3578+ otherwise
3579+@end display
3580+
3581+
3582+
3583+
3584+@subsection @code{BZ2_bzWriteClose}
3585+@example
3586+ void BZ2_bzWriteClose ( int *bzerror, BZFILE* f,
3587+ int abandon,
3588+ unsigned int* nbytes_in,
3589+ unsigned int* nbytes_out );
3590+
3591+ void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f,
3592+ int abandon,
3593+ unsigned int* nbytes_in_lo32,
3594+ unsigned int* nbytes_in_hi32,
3595+ unsigned int* nbytes_out_lo32,
3596+ unsigned int* nbytes_out_hi32 );
3597+@end example
3598+
3599+Compresses and flushes to the compressed file all data so far supplied
3600+by @code{BZ2_bzWrite}. The logical end-of-stream markers are also written, so
3601+subsequent calls to @code{BZ2_bzWrite} are illegal. All memory associated
3602+with the compressed file @code{b} is released.
3603+@code{fflush} is called on the
3604+compressed file, but it is not @code{fclose}'d.
3605+
3606+If @code{BZ2_bzWriteClose} is called to clean up after an error, the only
3607+action is to release the memory. The library records the error codes
3608+issued by previous calls, so this situation will be detected
3609+automatically. There is no attempt to complete the compression
3610+operation, nor to @code{fflush} the compressed file. You can force this
3611+behaviour to happen even in the case of no error, by passing a nonzero
3612+value to @code{abandon}.
3613+
3614+If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the
3615+total volume of uncompressed data handled. Similarly, @code{nbytes_out}
3616+will be set to the total volume of compressed data written. For
3617+compatibility with older versions of the library, @code{BZ2_bzWriteClose}
3618+only yields the lower 32 bits of these counts. Use
3619+@code{BZ2_bzWriteClose64} if you want the full 64 bit counts. These
3620+two functions are otherwise absolutely identical.
3621+
3622+
3623+Possible assignments to @code{bzerror}:
3624+@display
3625+ @code{BZ_SEQUENCE_ERROR}
3626+ if @code{b} was opened with @code{BZ2_bzReadOpen}
3627+ @code{BZ_IO_ERROR}
3628+ if there is an error writing the compressed file
3629+ @code{BZ_OK}
3630+ otherwise
3631+@end display
3632+
3633+@subsection Handling embedded compressed data streams
3634+
3635+The high-level library facilitates use of
3636+@code{bzip2} data streams which form some part of a surrounding, larger
3637+data stream.
3638+@itemize @bullet
3639+@item For writing, the library takes an open file handle, writes
3640+compressed data to it, @code{fflush}es it but does not @code{fclose} it.
3641+The calling application can write its own data before and after the
3642+compressed data stream, using that same file handle.
3643+@item Reading is more complex, and the facilities are not as general
3644+as they could be since generality is hard to reconcile with efficiency.
3645+@code{BZ2_bzRead} reads from the compressed file in blocks of size
3646+@code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot
3647+the logical end of compressed stream.
3648+To recover this data once decompression has
3649+ended, call @code{BZ2_bzReadGetUnused} after the last call of @code{BZ2_bzRead}
3650+(the one returning @code{BZ_STREAM_END}) but before calling
3651+@code{BZ2_bzReadClose}.
3652+@end itemize
3653+
3654+This mechanism makes it easy to decompress multiple @code{bzip2}
3655+streams placed end-to-end. As the end of one stream, when @code{BZ2_bzRead}
3656+returns @code{BZ_STREAM_END}, call @code{BZ2_bzReadGetUnused} to collect the
3657+unused data (copy it into your own buffer somewhere).
3658+That data forms the start of the next compressed stream.
3659+To start uncompressing that next stream, call @code{BZ2_bzReadOpen} again,
3660+feeding in the unused data via the @code{unused}/@code{nUnused}
3661+parameters.
3662+Keep doing this until @code{BZ_STREAM_END} return coincides with the
3663+physical end of file (@code{feof(f)}). In this situation
3664+@code{BZ2_bzReadGetUnused}
3665+will of course return no data.
3666+
3667+This should give some feel for how the high-level interface can be used.
3668+If you require extra flexibility, you'll have to bite the bullet and get
3669+to grips with the low-level interface.
3670+
3671+@subsection Standard file-reading/writing code
3672+Here's how you'd write data to a compressed file:
3673+@example @code
3674+FILE* f;
3675+BZFILE* b;
3676+int nBuf;
3677+char buf[ /* whatever size you like */ ];
3678+int bzerror;
3679+int nWritten;
3680+
3681+f = fopen ( "myfile.bz2", "w" );
3682+if (!f) @{
3683+ /* handle error */
3684+@}
3685+b = BZ2_bzWriteOpen ( &bzerror, f, 9 );
3686+if (bzerror != BZ_OK) @{
3687+ BZ2_bzWriteClose ( b );
3688+ /* handle error */
3689+@}
3690+
3691+while ( /* condition */ ) @{
3692+ /* get data to write into buf, and set nBuf appropriately */
3693+ nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
3694+ if (bzerror == BZ_IO_ERROR) @{
3695+ BZ2_bzWriteClose ( &bzerror, b );
3696+ /* handle error */
3697+ @}
3698+@}
3699+
3700+BZ2_bzWriteClose ( &bzerror, b );
3701+if (bzerror == BZ_IO_ERROR) @{
3702+ /* handle error */
3703+@}
3704+@end example
3705+And to read from a compressed file:
3706+@example
3707+FILE* f;
3708+BZFILE* b;
3709+int nBuf;
3710+char buf[ /* whatever size you like */ ];
3711+int bzerror;
3712+int nWritten;
3713+
3714+f = fopen ( "myfile.bz2", "r" );
3715+if (!f) @{
3716+ /* handle error */
3717+@}
3718+b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
3719+if (bzerror != BZ_OK) @{
3720+ BZ2_bzReadClose ( &bzerror, b );
3721+ /* handle error */
3722+@}
3723+
3724+bzerror = BZ_OK;
3725+while (bzerror == BZ_OK && /* arbitrary other conditions */) @{
3726+ nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
3727+ if (bzerror == BZ_OK) @{
3728+ /* do something with buf[0 .. nBuf-1] */
3729+ @}
3730+@}
3731+if (bzerror != BZ_STREAM_END) @{
3732+ BZ2_bzReadClose ( &bzerror, b );
3733+ /* handle error */
3734+@} else @{
3735+ BZ2_bzReadClose ( &bzerror );
3736+@}
3737+@end example
3738+
3739+
3740+
3741+@section Utility functions
3742+@subsection @code{BZ2_bzBuffToBuffCompress}
3743+@example
3744+ int BZ2_bzBuffToBuffCompress( char* dest,
3745+ unsigned int* destLen,
3746+ char* source,
3747+ unsigned int sourceLen,
3748+ int blockSize100k,
3749+ int verbosity,
3750+ int workFactor );
3751+@end example
3752+Attempts to compress the data in @code{source[0 .. sourceLen-1]}
3753+into the destination buffer, @code{dest[0 .. *destLen-1]}.
3754+If the destination buffer is big enough, @code{*destLen} is
3755+set to the size of the compressed data, and @code{BZ_OK} is
3756+returned. If the compressed data won't fit, @code{*destLen}
3757+is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
3758+
3759+Compression in this manner is a one-shot event, done with a single call
3760+to this function. The resulting compressed data is a complete
3761+@code{bzip2} format data stream. There is no mechanism for making
3762+additional calls to provide extra input data. If you want that kind of
3763+mechanism, use the low-level interface.
3764+
3765+For the meaning of parameters @code{blockSize100k}, @code{verbosity}
3766+and @code{workFactor}, @* see @code{BZ2_bzCompressInit}.
3767+
3768+To guarantee that the compressed data will fit in its buffer, allocate
3769+an output buffer of size 1% larger than the uncompressed data, plus
3770+six hundred extra bytes.
3771+
3772+@code{BZ2_bzBuffToBuffDecompress} will not write data at or
3773+beyond @code{dest[*destLen]}, even in case of buffer overflow.
3774+
3775+Possible return values:
3776+@display
3777+ @code{BZ_CONFIG_ERROR}
3778+ if the library has been mis-compiled
3779+ @code{BZ_PARAM_ERROR}
3780+ if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
3781+ or @code{blockSize100k < 1} or @code{blockSize100k > 9}
3782+ or @code{verbosity < 0} or @code{verbosity > 4}
3783+ or @code{workFactor < 0} or @code{workFactor > 250}
3784+ @code{BZ_MEM_ERROR}
3785+ if insufficient memory is available
3786+ @code{BZ_OUTBUFF_FULL}
3787+ if the size of the compressed data exceeds @code{*destLen}
3788+ @code{BZ_OK}
3789+ otherwise
3790+@end display
3791+
3792+
3793+
3794+@subsection @code{BZ2_bzBuffToBuffDecompress}
3795+@example
3796+ int BZ2_bzBuffToBuffDecompress ( char* dest,
3797+ unsigned int* destLen,
3798+ char* source,
3799+ unsigned int sourceLen,
3800+ int small,
3801+ int verbosity );
3802+@end example
3803+Attempts to decompress the data in @code{source[0 .. sourceLen-1]}
3804+into the destination buffer, @code{dest[0 .. *destLen-1]}.
3805+If the destination buffer is big enough, @code{*destLen} is
3806+set to the size of the uncompressed data, and @code{BZ_OK} is
3807+returned. If the compressed data won't fit, @code{*destLen}
3808+is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
3809+
3810+@code{source} is assumed to hold a complete @code{bzip2} format
3811+data stream. @* @code{BZ2_bzBuffToBuffDecompress} tries to decompress
3812+the entirety of the stream into the output buffer.
3813+
3814+For the meaning of parameters @code{small} and @code{verbosity},
3815+see @code{BZ2_bzDecompressInit}.
3816+
3817+Because the compression ratio of the compressed data cannot be known in
3818+advance, there is no easy way to guarantee that the output buffer will
3819+be big enough. You may of course make arrangements in your code to
3820+record the size of the uncompressed data, but such a mechanism is beyond
3821+the scope of this library.
3822+
3823+@code{BZ2_bzBuffToBuffDecompress} will not write data at or
3824+beyond @code{dest[*destLen]}, even in case of buffer overflow.
3825+
3826+Possible return values:
3827+@display
3828+ @code{BZ_CONFIG_ERROR}
3829+ if the library has been mis-compiled
3830+ @code{BZ_PARAM_ERROR}
3831+ if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
3832+ or @code{small != 0 && small != 1}
3833+ or @code{verbosity < 0} or @code{verbosity > 4}
3834+ @code{BZ_MEM_ERROR}
3835+ if insufficient memory is available
3836+ @code{BZ_OUTBUFF_FULL}
3837+ if the size of the compressed data exceeds @code{*destLen}
3838+ @code{BZ_DATA_ERROR}
3839+ if a data integrity error was detected in the compressed data
3840+ @code{BZ_DATA_ERROR_MAGIC}
3841+ if the compressed data doesn't begin with the right magic bytes
3842+ @code{BZ_UNEXPECTED_EOF}
3843+ if the compressed data ends unexpectedly
3844+ @code{BZ_OK}
3845+ otherwise
3846+@end display
3847+
3848+
3849+
3850+@section @code{zlib} compatibility functions
3851+Yoshioka Tsuneo has contributed some functions to
3852+give better @code{zlib} compatibility. These functions are
3853+@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
3854+@code{BZ2_bzclose},
3855+@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}.
3856+These functions are not (yet) officially part of
3857+the library. If they break, you get to keep all the pieces.
3858+Nevertheless, I think they work ok.
3859+@example
3860+typedef void BZFILE;
3861+
3862+const char * BZ2_bzlibVersion ( void );
3863+@end example
3864+Returns a string indicating the library version.
3865+@example
3866+BZFILE * BZ2_bzopen ( const char *path, const char *mode );
3867+BZFILE * BZ2_bzdopen ( int fd, const char *mode );
3868+@end example
3869+Opens a @code{.bz2} file for reading or writing, using either its name
3870+or a pre-existing file descriptor.
3871+Analogous to @code{fopen} and @code{fdopen}.
3872+@example
3873+int BZ2_bzread ( BZFILE* b, void* buf, int len );
3874+int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
3875+@end example
3876+Reads/writes data from/to a previously opened @code{BZFILE}.
3877+Analogous to @code{fread} and @code{fwrite}.
3878+@example
3879+int BZ2_bzflush ( BZFILE* b );
3880+void BZ2_bzclose ( BZFILE* b );
3881+@end example
3882+Flushes/closes a @code{BZFILE}. @code{BZ2_bzflush} doesn't actually do
3883+anything. Analogous to @code{fflush} and @code{fclose}.
3884+
3885+@example
3886+const char * BZ2_bzerror ( BZFILE *b, int *errnum )
3887+@end example
3888+Returns a string describing the more recent error status of
3889+@code{b}, and also sets @code{*errnum} to its numerical value.
3890+
3891+
3892+@section Using the library in a @code{stdio}-free environment
3893+
3894+@subsection Getting rid of @code{stdio}
3895+
3896+In a deeply embedded application, you might want to use just
3897+the memory-to-memory functions. You can do this conveniently
3898+by compiling the library with preprocessor symbol @code{BZ_NO_STDIO}
3899+defined. Doing this gives you a library containing only the following
3900+eight functions:
3901+
3902+@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, @code{BZ2_bzCompressEnd} @*
3903+@code{BZ2_bzDecompressInit}, @code{BZ2_bzDecompress}, @code{BZ2_bzDecompressEnd} @*
3904+@code{BZ2_bzBuffToBuffCompress}, @code{BZ2_bzBuffToBuffDecompress}
3905+
3906+When compiled like this, all functions will ignore @code{verbosity}
3907+settings.
3908+
3909+@subsection Critical error handling
3910+@code{libbzip2} contains a number of internal assertion checks which
3911+should, needless to say, never be activated. Nevertheless, if an
3912+assertion should fail, behaviour depends on whether or not the library
3913+was compiled with @code{BZ_NO_STDIO} set.
3914+
3915+For a normal compile, an assertion failure yields the message
3916+@example
3917+ bzip2/libbzip2: internal error number N.
3918+ This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000.
3919+ Please report it to me at: jseward@@acm.org. If this happened
3920+ when you were using some program which uses libbzip2 as a
3921+ component, you should also report this bug to the author(s)
3922+ of that program. Please make an effort to report this bug;
3923+ timely and accurate bug reports eventually lead to higher
3924+ quality software. Thanks. Julian Seward, 21 March 2000.
3925+@end example
3926+where @code{N} is some error code number. @code{exit(3)}
3927+is then called.
3928+
3929+For a @code{stdio}-free library, assertion failures result
3930+in a call to a function declared as:
3931+@example
3932+ extern void bz_internal_error ( int errcode );
3933+@end example
3934+The relevant code is passed as a parameter. You should supply
3935+such a function.
3936+
3937+In either case, once an assertion failure has occurred, any
3938+@code{bz_stream} records involved can be regarded as invalid.
3939+You should not attempt to resume normal operation with them.
3940+
3941+You may, of course, change critical error handling to suit
3942+your needs. As I said above, critical errors indicate bugs
3943+in the library and should not occur. All "normal" error
3944+situations are indicated via error return codes from functions,
3945+and can be recovered from.
3946+
3947+
3948+@section Making a Windows DLL
3949+Everything related to Windows has been contributed by Yoshioka Tsuneo
3950+@* (@code{QWF00133@@niftyserve.or.jp} /
3951+@code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to
3952+him (but perhaps Cc: me, @code{jseward@@acm.org}).
3953+
3954+My vague understanding of what to do is: using Visual C++ 5.0,
3955+open the project file @code{libbz2.dsp}, and build. That's all.
3956+
3957+If you can't
3958+open the project file for some reason, make a new one, naming these files:
3959+@code{blocksort.c}, @code{bzlib.c}, @code{compress.c},
3960+@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @*
3961+@code{randtable.c} and @code{libbz2.def}. You will also need
3962+to name the header files @code{bzlib.h} and @code{bzlib_private.h}.
3963+
3964+If you don't use VC++, you may need to define the proprocessor symbol
3965+@code{_WIN32}.
3966+
3967+Finally, @code{dlltest.c} is a sample program using the DLL. It has a
3968+project file, @code{dlltest.dsp}.
3969+
3970+If you just want a makefile for Visual C, have a look at
3971+@code{makefile.msc}.
3972+
3973+Be aware that if you compile @code{bzip2} itself on Win32, you must set
3974+@code{BZ_UNIX} to 0 and @code{BZ_LCCWIN32} to 1, in the file
3975+@code{bzip2.c}, before compiling. Otherwise the resulting binary won't
3976+work correctly.
3977+
3978+I haven't tried any of this stuff myself, but it all looks plausible.
3979+
3980+
3981+
3982+@chapter Miscellanea
3983+
3984+These are just some random thoughts of mine. Your mileage may
3985+vary.
3986+
3987+@section Limitations of the compressed file format
3988+@code{bzip2-1.0}, @code{0.9.5} and @code{0.9.0}
3989+use exactly the same file format as the previous
3990+version, @code{bzip2-0.1}. This decision was made in the interests of
3991+stability. Creating yet another incompatible compressed file format
3992+would create further confusion and disruption for users.
3993+
3994+Nevertheless, this is not a painless decision. Development
3995+work since the release of @code{bzip2-0.1} in August 1997
3996+has shown complexities in the file format which slow down
3997+decompression and, in retrospect, are unnecessary. These are:
3998+@itemize @bullet
3999+@item The run-length encoder, which is the first of the
4000+ compression transformations, is entirely irrelevant.
4001+ The original purpose was to protect the sorting algorithm
4002+ from the very worst case input: a string of repeated
4003+ symbols. But algorithm steps Q6a and Q6b in the original
4004+ Burrows-Wheeler technical report (SRC-124) show how
4005+ repeats can be handled without difficulty in block
4006+ sorting.
4007+@item The randomisation mechanism doesn't really need to be
4008+ there. Udi Manber and Gene Myers published a suffix
4009+ array construction algorithm a few years back, which
4010+ can be employed to sort any block, no matter how
4011+ repetitive, in O(N log N) time. Subsequent work by
4012+ Kunihiko Sadakane has produced a derivative O(N (log N)^2)
4013+ algorithm which usually outperforms the Manber-Myers
4014+ algorithm.
4015+
4016+ I could have changed to Sadakane's algorithm, but I find
4017+ it to be slower than @code{bzip2}'s existing algorithm for
4018+ most inputs, and the randomisation mechanism protects
4019+ adequately against bad cases. I didn't think it was
4020+ a good tradeoff to make. Partly this is due to the fact
4021+ that I was not flooded with email complaints about
4022+ @code{bzip2-0.1}'s performance on repetitive data, so
4023+ perhaps it isn't a problem for real inputs.
4024+
4025+ Probably the best long-term solution,
4026+ and the one I have incorporated into 0.9.5 and above,
4027+ is to use the existing sorting
4028+ algorithm initially, and fall back to a O(N (log N)^2)
4029+ algorithm if the standard algorithm gets into difficulties.
4030+@item The compressed file format was never designed to be
4031+ handled by a library, and I have had to jump though
4032+ some hoops to produce an efficient implementation of
4033+ decompression. It's a bit hairy. Try passing
4034+ @code{decompress.c} through the C preprocessor
4035+ and you'll see what I mean. Much of this complexity
4036+ could have been avoided if the compressed size of
4037+ each block of data was recorded in the data stream.
4038+@item An Adler-32 checksum, rather than a CRC32 checksum,
4039+ would be faster to compute.
4040+@end itemize
4041+It would be fair to say that the @code{bzip2} format was frozen
4042+before I properly and fully understood the performance
4043+consequences of doing so.
4044+
4045+Improvements which I was able to incorporate into
4046+0.9.0, despite using the same file format, are:
4047+@itemize @bullet
4048+@item Single array implementation of the inverse BWT. This
4049+ significantly speeds up decompression, presumably
4050+ because it reduces the number of cache misses.
4051+@item Faster inverse MTF transform for large MTF values. The
4052+ new implementation is based on the notion of sliding blocks
4053+ of values.
4054+@item @code{bzip2-0.9.0} now reads and writes files with @code{fread}
4055+ and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}.
4056+ Duh! Well, you live and learn.
4057+
4058+@end itemize
4059+Further ahead, it would be nice
4060+to be able to do random access into files. This will
4061+require some careful design of compressed file formats.
4062+
4063+
4064+
4065+@section Portability issues
4066+After some consideration, I have decided not to use
4067+GNU @code{autoconf} to configure 0.9.5 or 1.0.
4068+
4069+@code{autoconf}, admirable and wonderful though it is,
4070+mainly assists with portability problems between Unix-like
4071+platforms. But @code{bzip2} doesn't have much in the way
4072+of portability problems on Unix; most of the difficulties appear
4073+when porting to the Mac, or to Microsoft's operating systems.
4074+@code{autoconf} doesn't help in those cases, and brings in a
4075+whole load of new complexity.
4076+
4077+Most people should be able to compile the library and program
4078+under Unix straight out-of-the-box, so to speak, especially
4079+if you have a version of GNU C available.
4080+
4081+There are a couple of @code{__inline__} directives in the code. GNU C
4082+(@code{gcc}) should be able to handle them. If you're not using
4083+GNU C, your C compiler shouldn't see them at all.
4084+If your compiler does, for some reason, see them and doesn't
4085+like them, just @code{#define} @code{__inline__} to be @code{/* */}. One
4086+easy way to do this is to compile with the flag @code{-D__inline__=},
4087+which should be understood by most Unix compilers.
4088+
4089+If you still have difficulties, try compiling with the macro
4090+@code{BZ_STRICT_ANSI} defined. This should enable you to build the
4091+library in a strictly ANSI compliant environment. Building the program
4092+itself like this is dangerous and not supported, since you remove
4093+@code{bzip2}'s checks against compressing directories, symbolic links,
4094+devices, and other not-really-a-file entities. This could cause
4095+filesystem corruption!
4096+
4097+One other thing: if you create a @code{bzip2} binary for public
4098+distribution, please try and link it statically (@code{gcc -s}). This
4099+avoids all sorts of library-version issues that others may encounter
4100+later on.
4101+
4102+If you build @code{bzip2} on Win32, you must set @code{BZ_UNIX} to 0 and
4103+@code{BZ_LCCWIN32} to 1, in the file @code{bzip2.c}, before compiling.
4104+Otherwise the resulting binary won't work correctly.
4105+
4106+
4107+
4108+@section Reporting bugs
4109+I tried pretty hard to make sure @code{bzip2} is
4110+bug free, both by design and by testing. Hopefully
4111+you'll never need to read this section for real.
4112+
4113+Nevertheless, if @code{bzip2} dies with a segmentation
4114+fault, a bus error or an internal assertion failure, it
4115+will ask you to email me a bug report. Experience with
4116+version 0.1 shows that almost all these problems can
4117+be traced to either compiler bugs or hardware problems.
4118+@itemize @bullet
4119+@item
4120+Recompile the program with no optimisation, and see if it
4121+works. And/or try a different compiler.
4122+I heard all sorts of stories about various flavours
4123+of GNU C (and other compilers) generating bad code for
4124+@code{bzip2}, and I've run across two such examples myself.
4125+
4126+2.7.X versions of GNU C are known to generate bad code from
4127+time to time, at high optimisation levels.
4128+If you get problems, try using the flags
4129+@code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}.
4130+You should specifically @emph{not} use @code{-funroll-loops}.
4131+
4132+You may notice that the Makefile runs six tests as part of
4133+the build process. If the program passes all of these, it's
4134+a pretty good (but not 100%) indication that the compiler has
4135+done its job correctly.
4136+@item
4137+If @code{bzip2} crashes randomly, and the crashes are not
4138+repeatable, you may have a flaky memory subsystem. @code{bzip2}
4139+really hammers your memory hierarchy, and if it's a bit marginal,
4140+you may get these problems. Ditto if your disk or I/O subsystem
4141+is slowly failing. Yup, this really does happen.
4142+
4143+Try using a different machine of the same type, and see if
4144+you can repeat the problem.
4145+@item This isn't really a bug, but ... If @code{bzip2} tells
4146+you your file is corrupted on decompression, and you
4147+obtained the file via FTP, there is a possibility that you
4148+forgot to tell FTP to do a binary mode transfer. That absolutely
4149+will cause the file to be non-decompressible. You'll have to transfer
4150+it again.
4151+@end itemize
4152+
4153+If you've incorporated @code{libbzip2} into your own program
4154+and are getting problems, please, please, please, check that the
4155+parameters you are passing in calls to the library, are
4156+correct, and in accordance with what the documentation says
4157+is allowable. I have tried to make the library robust against
4158+such problems, but I'm sure I haven't succeeded.
4159+
4160+Finally, if the above comments don't help, you'll have to send
4161+me a bug report. Now, it's just amazing how many people will
4162+send me a bug report saying something like
4163+@display
4164+ bzip2 crashed with segmentation fault on my machine
4165+@end display
4166+and absolutely nothing else. Needless to say, a such a report
4167+is @emph{totally, utterly, completely and comprehensively 100% useless;
4168+a waste of your time, my time, and net bandwidth}.
4169+With no details at all, there's no way I can possibly begin
4170+to figure out what the problem is.
4171+
4172+The rules of the game are: facts, facts, facts. Don't omit
4173+them because "oh, they won't be relevant". At the bare
4174+minimum:
4175+@display
4176+ Machine type. Operating system version.
4177+ Exact version of @code{bzip2} (do @code{bzip2 -V}).
4178+ Exact version of the compiler used.
4179+ Flags passed to the compiler.
4180+@end display
4181+However, the most important single thing that will help me is
4182+the file that you were trying to compress or decompress at the
4183+time the problem happened. Without that, my ability to do anything
4184+more than speculate about the cause, is limited.
4185+
4186+Please remember that I connect to the Internet with a modem, so
4187+you should contact me before mailing me huge files.
4188+
4189+
4190+@section Did you get the right package?
4191+
4192+@code{bzip2} is a resource hog. It soaks up large amounts of CPU cycles
4193+and memory. Also, it gives very large latencies. In the worst case, you
4194+can feed many megabytes of uncompressed data into the library before
4195+getting any compressed output, so this probably rules out applications
4196+requiring interactive behaviour.
4197+
4198+These aren't faults of my implementation, I hope, but more
4199+an intrinsic property of the Burrows-Wheeler transform (unfortunately).
4200+Maybe this isn't what you want.
4201+
4202+If you want a compressor and/or library which is faster, uses less
4203+memory but gets pretty good compression, and has minimal latency,
4204+consider Jean-loup
4205+Gailly's and Mark Adler's work, @code{zlib-1.1.2} and
4206+@code{gzip-1.2.4}. Look for them at
4207+
4208+@code{http://www.cdrom.com/pub/infozip/zlib} and
4209+@code{http://www.gzip.org} respectively.
4210+
4211+For something faster and lighter still, you might try Markus F X J
4212+Oberhumer's @code{LZO} real-time compression/decompression library, at
4213+@* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}.
4214+
4215+If you want to use the @code{bzip2} algorithms to compress small blocks
4216+of data, 64k bytes or smaller, for example on an on-the-fly disk
4217+compressor, you'd be well advised not to use this library. Instead,
4218+I've made a special library tuned for that kind of use. It's part of
4219+@code{e2compr-0.40}, an on-the-fly disk compressor for the Linux
4220+@code{ext2} filesystem. Look at
4221+@code{http://www.netspace.net.au/~reiter/e2compr}.
4222+
4223+
4224+
4225+@section Testing
4226+
4227+A record of the tests I've done.
4228+
4229+First, some data sets:
4230+@itemize @bullet
4231+@item B: a directory containing 6001 files, one for every length in the
4232+ range 0 to 6000 bytes. The files contain random lowercase
4233+ letters. 18.7 megabytes.
4234+@item H: my home directory tree. Documents, source code, mail files,
4235+ compressed data. H contains B, and also a directory of
4236+ files designed as boundary cases for the sorting; mostly very
4237+ repetitive, nasty files. 565 megabytes.
4238+@item A: directory tree holding various applications built from source:
4239+ @code{egcs}, @code{gcc-2.8.1}, KDE, GTK, Octave, etc.
4240+ 2200 megabytes.
4241+@end itemize
4242+The tests conducted are as follows. Each test means compressing
4243+(a copy of) each file in the data set, decompressing it and
4244+comparing it against the original.
4245+
4246+First, a bunch of tests with block sizes and internal buffer
4247+sizes set very small,
4248+to detect any problems with the
4249+blocking and buffering mechanisms.
4250+This required modifying the source code so as to try to
4251+break it.
4252+@enumerate
4253+@item Data set H, with
4254+ buffer size of 1 byte, and block size of 23 bytes.
4255+@item Data set B, buffer sizes 1 byte, block size 1 byte.
4256+@item As (2) but small-mode decompression.
4257+@item As (2) with block size 2 bytes.
4258+@item As (2) with block size 3 bytes.
4259+@item As (2) with block size 4 bytes.
4260+@item As (2) with block size 5 bytes.
4261+@item As (2) with block size 6 bytes and small-mode decompression.
4262+@item H with buffer size of 1 byte, but normal block
4263+ size (up to 900000 bytes).
4264+@end enumerate
4265+Then some tests with unmodified source code.
4266+@enumerate
4267+@item H, all settings normal.
4268+@item As (1), with small-mode decompress.
4269+@item H, compress with flag @code{-1}.
4270+@item H, compress with flag @code{-s}, decompress with flag @code{-s}.
4271+@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing,
4272+ @code{bzip2-0.9.5} decompressing, all settings normal.
4273+@item Backwards compatibility: H, @code{bzip2-0.9.5} compressing,
4274+ @code{bzip2-0.1pl2} decompressing, all settings normal.
4275+@item Bigger tests: A, all settings normal.
4276+@item As (7), using the fallback (Sadakane-like) sorting algorithm.
4277+@item As (8), compress with flag @code{-1}, decompress with flag
4278+ @code{-s}.
4279+@item H, using the fallback sorting algorithm.
4280+@item Forwards compatibility: A, @code{bzip2-0.1pl2} compressing,
4281+ @code{bzip2-0.9.5} decompressing, all settings normal.
4282+@item Backwards compatibility: A, @code{bzip2-0.9.5} compressing,
4283+ @code{bzip2-0.1pl2} decompressing, all settings normal.
4284+@item Misc test: about 400 megabytes of @code{.tar} files with
4285+ @code{bzip2} compiled with Checker (a memory access error
4286+ detector, like Purify).
4287+@item Misc tests to make sure it builds and runs ok on non-Linux/x86
4288+ platforms.
4289+@end enumerate
4290+These tests were conducted on a 225 MHz IDT WinChip machine, running
4291+Linux 2.0.36. They represent nearly a week of continuous computation.
4292+All tests completed successfully.
4293+
4294+
4295+@section Further reading
4296+@code{bzip2} is not research work, in the sense that it doesn't present
4297+any new ideas. Rather, it's an engineering exercise based on existing
4298+ideas.
4299+
4300+Four documents describe essentially all the ideas behind @code{bzip2}:
4301+@example
4302+Michael Burrows and D. J. Wheeler:
4303+ "A block-sorting lossless data compression algorithm"
4304+ 10th May 1994.
4305+ Digital SRC Research Report 124.
4306+ ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
4307+ If you have trouble finding it, try searching at the
4308+ New Zealand Digital Library, http://www.nzdl.org.
4309+
4310+Daniel S. Hirschberg and Debra A. LeLewer
4311+ "Efficient Decoding of Prefix Codes"
4312+ Communications of the ACM, April 1990, Vol 33, Number 4.
4313+ You might be able to get an electronic copy of this
4314+ from the ACM Digital Library.
4315+
4316+David J. Wheeler
4317+ Program bred3.c and accompanying document bred3.ps.
4318+ This contains the idea behind the multi-table Huffman
4319+ coding scheme.
4320+ ftp://ftp.cl.cam.ac.uk/users/djw3/
4321+
4322+Jon L. Bentley and Robert Sedgewick
4323+ "Fast Algorithms for Sorting and Searching Strings"
4324+ Available from Sedgewick's web page,
4325+ www.cs.princeton.edu/~rs
4326+@end example
4327+The following paper gives valuable additional insights into the
4328+algorithm, but is not immediately the basis of any code
4329+used in bzip2.
4330+@example
4331+Peter Fenwick:
4332+ Block Sorting Text Compression
4333+ Proceedings of the 19th Australasian Computer Science Conference,
4334+ Melbourne, Australia. Jan 31 - Feb 2, 1996.
4335+ ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
4336+@end example
4337+Kunihiko Sadakane's sorting algorithm, mentioned above,
4338+is available from:
4339+@example
4340+http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
4341+@end example
4342+The Manber-Myers suffix array construction
4343+algorithm is described in a paper
4344+available from:
4345+@example
4346+http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
4347+@end example
4348+Finally, the following paper documents some recent investigations
4349+I made into the performance of sorting algorithms:
4350+@example
4351+Julian Seward:
4352+ On the Performance of BWT Sorting Algorithms
4353+ Proceedings of the IEEE Data Compression Conference 2000
4354+ Snowbird, Utah. 28-30 March 2000.
4355+@end example
4356+
4357+
4358+@contents
4359+
4360+@bye
4361+
4362diff -Nru bzip2-1.0.1/doc/bzip2recover.1 bzip2-1.0.1.new/doc/bzip2recover.1
4363--- bzip2-1.0.1/doc/bzip2recover.1 Thu Jan 1 01:00:00 1970
4364+++ bzip2-1.0.1.new/doc/bzip2recover.1 Sat Jun 24 20:13:06 2000
4365@@ -0,0 +1 @@
4366+.so bzip2.1
4367\ No newline at end of file
4368diff -Nru bzip2-1.0.1/doc/pl/Makefile.am bzip2-1.0.1.new/doc/pl/Makefile.am
4369--- bzip2-1.0.1/doc/pl/Makefile.am Thu Jan 1 01:00:00 1970
4370+++ bzip2-1.0.1.new/doc/pl/Makefile.am Sat Jun 24 20:13:06 2000
4371@@ -0,0 +1,4 @@
4372+
4373+mandir = @mandir@/pl
4374+man_MANS = bzip2.1 bunzip2.1 bzcat.1 bzip2recover.1
4375+
4376diff -Nru bzip2-1.0.1/doc/pl/bunzip2.1 bzip2-1.0.1.new/doc/pl/bunzip2.1
4377--- bzip2-1.0.1/doc/pl/bunzip2.1 Thu Jan 1 01:00:00 1970
4378+++ bzip2-1.0.1.new/doc/pl/bunzip2.1 Sat Jun 24 20:13:06 2000
4379@@ -0,0 +1 @@
4380+.so bzip2.1
4381\ No newline at end of file
4382diff -Nru bzip2-1.0.1/doc/pl/bzcat.1 bzip2-1.0.1.new/doc/pl/bzcat.1
4383--- bzip2-1.0.1/doc/pl/bzcat.1 Thu Jan 1 01:00:00 1970
4384+++ bzip2-1.0.1.new/doc/pl/bzcat.1 Sat Jun 24 20:13:06 2000
4385@@ -0,0 +1 @@
4386+.so bzip2.1
4387\ No newline at end of file
4388diff -Nru bzip2-1.0.1/doc/pl/bzip2.1 bzip2-1.0.1.new/doc/pl/bzip2.1
4389--- bzip2-1.0.1/doc/pl/bzip2.1 Thu Jan 1 01:00:00 1970
4390+++ bzip2-1.0.1.new/doc/pl/bzip2.1 Sat Jun 24 20:13:06 2000
4391@@ -0,0 +1,384 @@
4392