]> git.pld-linux.org Git - packages/bzip2.git/blame - bzip2-libtoolizeautoconf.patch
- added man-pages tar URL
[packages/bzip2.git] / bzip2-libtoolizeautoconf.patch
CommitLineData
d967e3ec 1diff -Nru bzip2-1.0.1/AUTHORS bzip2-1.0.1.new/AUTHORS
2--- bzip2-1.0.1/AUTHORS Thu Jan 1 01:00:00 1970
3+++ bzip2-1.0.1.new/AUTHORS Sat Jun 24 20:13:05 2000
4@@ -0,0 +1 @@
5+Julian Seward <jseward@acm.org>
6diff -Nru bzip2-1.0.1/CHANGES bzip2-1.0.1.new/CHANGES
7--- bzip2-1.0.1/CHANGES Sat Jun 24 20:13:27 2000
8+++ bzip2-1.0.1.new/CHANGES Thu Jan 1 01:00:00 1970
9@@ -1,167 +0,0 @@
10-
11-
12-0.9.0
13-~~~~~
14-First version.
15-
16-
17-0.9.0a
18-~~~~~~
19-Removed 'ranlib' from Makefile, since most modern Unix-es
20-don't need it, or even know about it.
21-
22-
23-0.9.0b
24-~~~~~~
25-Fixed a problem with error reporting in bzip2.c. This does not effect
26-the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the
27-program proper) compress and decompress correctly, but give misleading
28-error messages (internal panics) when an I/O error occurs, instead of
29-reporting the problem correctly. This shouldn't give any data loss
30-(as far as I can see), but is confusing.
31-
32-Made the inline declarations disappear for non-GCC compilers.
33-
34-
35-0.9.0c
36-~~~~~~
37-Fixed some problems in the library pertaining to some boundary cases.
38-This makes the library behave more correctly in those situations. The
39-fixes apply only to features (calls and parameters) not used by
40-bzip2.c, so the non-fixedness of them in previous versions has no
41-effect on reliability of bzip2.c.
42-
43-In bzlib.c:
44- * made zero-length BZ_FLUSH work correctly in bzCompress().
45- * fixed bzWrite/bzRead to ignore zero-length requests.
46- * fixed bzread to correctly handle read requests after EOF.
47- * wrong parameter order in call to bzDecompressInit in
48- bzBuffToBuffDecompress. Fixed.
49-
50-In compress.c:
51- * changed setting of nGroups in sendMTFValues() so as to
52- do a bit better on small files. This _does_ effect
53- bzip2.c.
54-
55-
56-0.9.5a
57-~~~~~~
58-Major change: add a fallback sorting algorithm (blocksort.c)
59-to give reasonable behaviour even for very repetitive inputs.
60-Nuked --repetitive-best and --repetitive-fast since they are
61-no longer useful.
62-
63-Minor changes: mostly a whole bunch of small changes/
64-bugfixes in the driver (bzip2.c). Changes pertaining to the
65-user interface are:
66-
67- allow decompression of symlink'd files to stdout
68- decompress/test files even without .bz2 extension
69- give more accurate error messages for I/O errors
70- when compressing/decompressing to stdout, don't catch control-C
71- read flags from BZIP2 and BZIP environment variables
72- decline to break hard links to a file unless forced with -f
73- allow -c flag even with no filenames
74- preserve file ownerships as far as possible
75- make -s -1 give the expected block size (100k)
76- add a flag -q --quiet to suppress nonessential warnings
77- stop decoding flags after --, so files beginning in - can be handled
78- resolved inconsistent naming: bzcat or bz2cat ?
79- bzip2 --help now returns 0
80-
81-Programming-level changes are:
82-
83- fixed syntax error in GET_LL4 for Borland C++ 5.02
84- let bzBuffToBuffDecompress return BZ_DATA_ERROR{_MAGIC}
85- fix overshoot of mode-string end in bzopen_or_bzdopen
86- wrapped bzlib.h in #ifdef __cplusplus ... extern "C" { ... }
87- close file handles under all error conditions
88- added minor mods so it compiles with DJGPP out of the box
89- fixed Makefile so it doesn't give problems with BSD make
90- fix uninitialised memory reads in dlltest.c
91-
92-0.9.5b
93-~~~~~~
94-Open stdin/stdout in binary mode for DJGPP.
95-
96-0.9.5c
97-~~~~~~
98-Changed BZ_N_OVERSHOOT to be ... + 2 instead of ... + 1. The + 1
99-version could cause the sorted order to be wrong in some extremely
100-obscure cases. Also changed setting of quadrant in blocksort.c.
101-
102-0.9.5d
103-~~~~~~
104-The only functional change is to make bzlibVersion() in the library
105-return the correct string. This has no effect whatsoever on the
106-functioning of the bzip2 program or library. Added a couple of casts
107-so the library compiles without warnings at level 3 in MS Visual
108-Studio 6.0. Included a Y2K statement in the file Y2K_INFO. All other
109-changes are minor documentation changes.
110-
111-1.0
112-~~~
113-Several minor bugfixes and enhancements:
114-
115-* Large file support. The library uses 64-bit counters to
116- count the volume of data passing through it. bzip2.c
117- is now compiled with -D_FILE_OFFSET_BITS=64 to get large
118- file support from the C library. -v correctly prints out
119- file sizes greater than 4 gigabytes. All these changes have
120- been made without assuming a 64-bit platform or a C compiler
121- which supports 64-bit ints, so, except for the C library
122- aspect, they are fully portable.
123-
124-* Decompression robustness. The library/program should be
125- robust to any corruption of compressed data, detecting and
126- handling _all_ corruption, instead of merely relying on
127- the CRCs. What this means is that the program should
128- never crash, given corrupted data, and the library should
129- always return BZ_DATA_ERROR.
130-
131-* Fixed an obscure race-condition bug only ever observed on
132- Solaris, in which, if you were very unlucky and issued
133- control-C at exactly the wrong time, both input and output
134- files would be deleted.
135-
136-* Don't run out of file handles on test/decompression when
137- large numbers of files have invalid magic numbers.
138-
139-* Avoid library namespace pollution. Prefix all exported
140- symbols with BZ2_.
141-
142-* Minor sorting enhancements from my DCC2000 paper.
143-
144-* Advance the version number to 1.0, so as to counteract the
145- (false-in-this-case) impression some people have that programs
146- with version numbers less than 1.0 are in someway, experimental,
147- pre-release versions.
148-
149-* Create an initial Makefile-libbz2_so to build a shared library.
150- Yes, I know I should really use libtool et al ...
151-
152-* Make the program exit with 2 instead of 0 when decompression
153- fails due to a bad magic number (ie, an invalid bzip2 header).
154- Also exit with 1 (as the manual claims :-) whenever a diagnostic
155- message would have been printed AND the corresponding operation
156- is aborted, for example
157- bzip2: Output file xx already exists.
158- When a diagnostic message is printed but the operation is not
159- aborted, for example
160- bzip2: Can't guess original name for wurble -- using wurble.out
161- then the exit value 0 is returned, unless some other problem is
162- also detected.
163-
164- I think it corresponds more closely to what the manual claims now.
165-
166-
167-1.0.1
168-~~~~~
169-* Modified dlltest.c so it uses the new BZ2_ naming scheme.
170-* Modified makefile-msc to fix minor build probs on Win2k.
171-* Updated README.COMPILATION.PROBLEMS.
172-
173-There are no functionality changes or bug fixes relative to version
174-1.0.0. This is just a documentation update + a fix for minor Win32
175-build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
176-utterly pointless. Don't bother.
177diff -Nru bzip2-1.0.1/COPYING bzip2-1.0.1.new/COPYING
178--- bzip2-1.0.1/COPYING Thu Jan 1 01:00:00 1970
179+++ bzip2-1.0.1.new/COPYING Sat Jun 24 20:13:05 2000
180@@ -0,0 +1,39 @@
181+
182+This program, "bzip2" and associated library "libbzip2", are
183+copyright (C) 1996-2000 Julian R Seward. All rights reserved.
184+
185+Redistribution and use in source and binary forms, with or without
186+modification, are permitted provided that the following conditions
187+are met:
188+
189+1. Redistributions of source code must retain the above copyright
190+ notice, this list of conditions and the following disclaimer.
191+
192+2. The origin of this software must not be misrepresented; you must
193+ not claim that you wrote the original software. If you use this
194+ software in a product, an acknowledgment in the product
195+ documentation would be appreciated but is not required.
196+
197+3. Altered source versions must be plainly marked as such, and must
198+ not be misrepresented as being the original software.
199+
200+4. The name of the author may not be used to endorse or promote
201+ products derived from this software without specific prior written
202+ permission.
203+
204+THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
205+OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
206+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
207+ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
208+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
209+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
210+GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
211+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
212+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
213+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
214+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
215+
216+Julian Seward, Cambridge, UK.
217+jseward@acm.org
218+bzip2/libbzip2 version 1.0 of 21 March 2000
219+
220diff -Nru bzip2-1.0.1/ChangeLog bzip2-1.0.1.new/ChangeLog
221--- bzip2-1.0.1/ChangeLog Thu Jan 1 01:00:00 1970
222+++ bzip2-1.0.1.new/ChangeLog Sat Jun 24 20:13:05 2000
223@@ -0,0 +1 @@
224+
225diff -Nru bzip2-1.0.1/INSTALL bzip2-1.0.1.new/INSTALL
226--- bzip2-1.0.1/INSTALL Thu Jan 1 01:00:00 1970
227+++ bzip2-1.0.1.new/INSTALL Sat Jun 24 20:13:06 2000
228@@ -0,0 +1,182 @@
229+Basic Installation
230+==================
231+
232+ These are generic installation instructions.
233+
234+ The `configure' shell script attempts to guess correct values for
235+various system-dependent variables used during compilation. It uses
236+those values to create a `Makefile' in each directory of the package.
237+It may also create one or more `.h' files containing system-dependent
238+definitions. Finally, it creates a shell script `config.status' that
239+you can run in the future to recreate the current configuration, a file
240+`config.cache' that saves the results of its tests to speed up
241+reconfiguring, and a file `config.log' containing compiler output
242+(useful mainly for debugging `configure').
243+
244+ If you need to do unusual things to compile the package, please try
245+to figure out how `configure' could check whether to do them, and mail
246+diffs or instructions to the address given in the `README' so they can
247+be considered for the next release. If at some point `config.cache'
248+contains results you don't want to keep, you may remove or edit it.
249+
250+ The file `configure.in' is used to create `configure' by a program
251+called `autoconf'. You only need `configure.in' if you want to change
252+it or regenerate `configure' using a newer version of `autoconf'.
253+
254+The simplest way to compile this package is:
255+
256+ 1. `cd' to the directory containing the package's source code and type
257+ `./configure' to configure the package for your system. If you're
258+ using `csh' on an old version of System V, you might need to type
259+ `sh ./configure' instead to prevent `csh' from trying to execute
260+ `configure' itself.
261+
262+ Running `configure' takes awhile. While running, it prints some
263+ messages telling which features it is checking for.
264+
265+ 2. Type `make' to compile the package.
266+
267+ 3. Optionally, type `make check' to run any self-tests that come with
268+ the package.
269+
270+ 4. Type `make install' to install the programs and any data files and
271+ documentation.
272+
273+ 5. You can remove the program binaries and object files from the
274+ source code directory by typing `make clean'. To also remove the
275+ files that `configure' created (so you can compile the package for
276+ a different kind of computer), type `make distclean'. There is
277+ also a `make maintainer-clean' target, but that is intended mainly
278+ for the package's developers. If you use it, you may have to get
279+ all sorts of other programs in order to regenerate files that came
280+ with the distribution.
281+
282+Compilers and Options
283+=====================
284+
285+ Some systems require unusual options for compilation or linking that
286+the `configure' script does not know about. You can give `configure'
287+initial values for variables by setting them in the environment. Using
288+a Bourne-compatible shell, you can do that on the command line like
289+this:
290+ CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
291+
292+Or on systems that have the `env' program, you can do it like this:
293+ env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
294+
295+Compiling For Multiple Architectures
296+====================================
297+
298+ You can compile the package for more than one kind of computer at the
299+same time, by placing the object files for each architecture in their
300+own directory. To do this, you must use a version of `make' that
301+supports the `VPATH' variable, such as GNU `make'. `cd' to the
302+directory where you want the object files and executables to go and run
303+the `configure' script. `configure' automatically checks for the
304+source code in the directory that `configure' is in and in `..'.
305+
306+ If you have to use a `make' that does not supports the `VPATH'
307+variable, you have to compile the package for one architecture at a time
308+in the source code directory. After you have installed the package for
309+one architecture, use `make distclean' before reconfiguring for another
310+architecture.
311+
312+Installation Names
313+==================
314+
315+ By default, `make install' will install the package's files in
316+`/usr/local/bin', `/usr/local/man', etc. You can specify an
317+installation prefix other than `/usr/local' by giving `configure' the
318+option `--prefix=PATH'.
319+
320+ You can specify separate installation prefixes for
321+architecture-specific files and architecture-independent files. If you
322+give `configure' the option `--exec-prefix=PATH', the package will use
323+PATH as the prefix for installing programs and libraries.
324+Documentation and other data files will still use the regular prefix.
325+
326+ In addition, if you use an unusual directory layout you can give
327+options like `--bindir=PATH' to specify different values for particular
328+kinds of files. Run `configure --help' for a list of the directories
329+you can set and what kinds of files go in them.
330+
331+ If the package supports it, you can cause programs to be installed
332+with an extra prefix or suffix on their names by giving `configure' the
333+option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
334+
335+Optional Features
336+=================
337+
338+ Some packages pay attention to `--enable-FEATURE' options to
339+`configure', where FEATURE indicates an optional part of the package.
340+They may also pay attention to `--with-PACKAGE' options, where PACKAGE
341+is something like `gnu-as' or `x' (for the X Window System). The
342+`README' should mention any `--enable-' and `--with-' options that the
343+package recognizes.
344+
345+ For packages that use the X Window System, `configure' can usually
346+find the X include and library files automatically, but if it doesn't,
347+you can use the `configure' options `--x-includes=DIR' and
348+`--x-libraries=DIR' to specify their locations.
349+
350+Specifying the System Type
351+==========================
352+
353+ There may be some features `configure' can not figure out
354+automatically, but needs to determine by the type of host the package
355+will run on. Usually `configure' can figure that out, but if it prints
356+a message saying it can not guess the host type, give it the
357+`--host=TYPE' option. TYPE can either be a short name for the system
358+type, such as `sun4', or a canonical name with three fields:
359+ CPU-COMPANY-SYSTEM
360+
361+See the file `config.sub' for the possible values of each field. If
362+`config.sub' isn't included in this package, then this package doesn't
363+need to know the host type.
364+
365+ If you are building compiler tools for cross-compiling, you can also
366+use the `--target=TYPE' option to select the type of system they will
367+produce code for and the `--build=TYPE' option to select the type of
368+system on which you are compiling the package.
369+
370+Sharing Defaults
371+================
372+
373+ If you want to set default values for `configure' scripts to share,
374+you can create a site shell script called `config.site' that gives
375+default values for variables like `CC', `cache_file', and `prefix'.
376+`configure' looks for `PREFIX/share/config.site' if it exists, then
377+`PREFIX/etc/config.site' if it exists. Or, you can set the
378+`CONFIG_SITE' environment variable to the location of the site script.
379+A warning: not all `configure' scripts look for a site script.
380+
381+Operation Controls
382+==================
383+
384+ `configure' recognizes the following options to control how it
385+operates.
386+
387+`--cache-file=FILE'
388+ Use and save the results of the tests in FILE instead of
389+ `./config.cache'. Set FILE to `/dev/null' to disable caching, for
390+ debugging `configure'.
391+
392+`--help'
393+ Print a summary of the options to `configure', and exit.
394+
395+`--quiet'
396+`--silent'
397+`-q'
398+ Do not print messages saying which checks are being made. To
399+ suppress all normal output, redirect it to `/dev/null' (any error
400+ messages will still be shown).
401+
402+`--srcdir=DIR'
403+ Look for the package's source code in directory DIR. Usually
404+ `configure' can determine that directory automatically.
405+
406+`--version'
407+ Print the version of Autoconf used to generate the `configure'
408+ script, and exit.
409+
410+`configure' also accepts some other, not widely useful, options.
411diff -Nru bzip2-1.0.1/LICENSE bzip2-1.0.1.new/LICENSE
412--- bzip2-1.0.1/LICENSE Sat Jun 24 20:13:27 2000
413+++ bzip2-1.0.1.new/LICENSE Thu Jan 1 01:00:00 1970
414@@ -1,39 +0,0 @@
415-
416-This program, "bzip2" and associated library "libbzip2", are
417-copyright (C) 1996-2000 Julian R Seward. All rights reserved.
418-
419-Redistribution and use in source and binary forms, with or without
420-modification, are permitted provided that the following conditions
421-are met:
422-
423-1. Redistributions of source code must retain the above copyright
424- notice, this list of conditions and the following disclaimer.
425-
426-2. The origin of this software must not be misrepresented; you must
427- not claim that you wrote the original software. If you use this
428- software in a product, an acknowledgment in the product
429- documentation would be appreciated but is not required.
430-
431-3. Altered source versions must be plainly marked as such, and must
432- not be misrepresented as being the original software.
433-
434-4. The name of the author may not be used to endorse or promote
435- products derived from this software without specific prior written
436- permission.
437-
438-THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
439-OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
440-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
441-ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
442-DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
443-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
444-GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
445-INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
446-WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
447-NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
448-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
449-
450-Julian Seward, Cambridge, UK.
451-jseward@acm.org
452-bzip2/libbzip2 version 1.0 of 21 March 2000
453-
454diff -Nru bzip2-1.0.1/Makefile-libbz2_so bzip2-1.0.1.new/Makefile-libbz2_so
455--- bzip2-1.0.1/Makefile-libbz2_so Sat Jun 24 20:13:27 2000
456+++ bzip2-1.0.1.new/Makefile-libbz2_so Thu Jan 1 01:00:00 1970
457@@ -1,43 +0,0 @@
458-
459-# This Makefile builds a shared version of the library,
460-# libbz2.so.1.0.1, with soname libbz2.so.1.0,
461-# at least on x86-Linux (RedHat 5.2),
462-# with gcc-2.7.2.3. Please see the README file for some
463-# important info about building the library like this.
464-
465-SHELL=/bin/sh
466-CC=gcc
467-BIGFILES=-D_FILE_OFFSET_BITS=64
468-CFLAGS=-fpic -fPIC -Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
469-
470-OBJS= blocksort.o \
471- huffman.o \
472- crctable.o \
473- randtable.o \
474- compress.o \
475- decompress.o \
476- bzlib.o
477-
478-all: $(OBJS)
479- $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
480- $(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1
481- rm -f libbz2.so.1.0
482- ln -s libbz2.so.1.0.1 libbz2.so.1.0
483-
484-clean:
485- rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared
486-
487-blocksort.o: blocksort.c
488- $(CC) $(CFLAGS) -c blocksort.c
489-huffman.o: huffman.c
490- $(CC) $(CFLAGS) -c huffman.c
491-crctable.o: crctable.c
492- $(CC) $(CFLAGS) -c crctable.c
493-randtable.o: randtable.c
494- $(CC) $(CFLAGS) -c randtable.c
495-compress.o: compress.c
496- $(CC) $(CFLAGS) -c compress.c
497-decompress.o: decompress.c
498- $(CC) $(CFLAGS) -c decompress.c
499-bzlib.o: bzlib.c
500- $(CC) $(CFLAGS) -c bzlib.c
501diff -Nru bzip2-1.0.1/Makefile.am bzip2-1.0.1.new/Makefile.am
502--- bzip2-1.0.1/Makefile.am Thu Jan 1 01:00:00 1970
503+++ bzip2-1.0.1.new/Makefile.am Sat Jun 24 20:17:47 2000
504@@ -0,0 +1,31 @@
505+SUBDIRS = doc
506+
507+bin_PROGRAMS = bzip2 bzip2recover
508+bzip2_SOURCES = bzip2.c
509+
510+bzip2_LDADD = libbz2.la
511+bzip2recover_SOURCES = bzip2recover.c
512+lib_LTLIBRARIES = libbz2.la
513+libbz2_la_SOURCES = \
514+ blocksort.c \
515+ huffman.c \
516+ crctable.c \
517+ randtable.c \
518+ compress.c \
519+ decompress.c \
520+ bzlib.c \
521+ bzlib.h \
522+ bzlib_private.h
523+
524+libbz2_la_LDFLAGS = -version-info 1:0:0
525+include_HEADERS = bzlib.h bzlib_private.h
526+
ff248cb7 527+bin_SCRIPTS = bzless bzgrep
d967e3ec 528+
529+EXTRA_DIST = README README.COMPILATION.PROBLEMS \
530+ Y2K_INFO libbz2.def libbz2.dsp \
531+ sample1.bz2 sample1.ref sample2.bz2 sample2.ref sample3.bz2 sample3.ref
532+
533+install-exec-hook:
534+ $(LN_S) -f bzip2 $(DESTDIR)$(bindir)/bunzip2
535+ $(LN_S) -f bzip2 $(DESTDIR)$(bindir)/bzcat
536diff -Nru bzip2-1.0.1/NEWS bzip2-1.0.1.new/NEWS
537--- bzip2-1.0.1/NEWS Thu Jan 1 01:00:00 1970
538+++ bzip2-1.0.1.new/NEWS Sat Jun 24 20:13:06 2000
539@@ -0,0 +1,12 @@
540+
541+
542+1.0.1
543+~~~~~
544+* Modified dlltest.c so it uses the new BZ2_ naming scheme.
545+* Modified makefile-msc to fix minor build probs on Win2k.
546+* Updated README.COMPILATION.PROBLEMS.
547+
548+There are no functionality changes or bug fixes relative to version
549+1.0.0. This is just a documentation update + a fix for minor Win32
550+build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
551+utterly pointless. Don't bother.
d967e3ec 552diff -Nru bzip2-1.0.1/bzip2.1 bzip2-1.0.1.new/bzip2.1
553--- bzip2-1.0.1/bzip2.1 Sat Jun 24 20:13:27 2000
554+++ bzip2-1.0.1.new/bzip2.1 Thu Jan 1 01:00:00 1970
555@@ -1,439 +0,0 @@
556-.PU
557-.TH bzip2 1
558-.SH NAME
559-bzip2, bunzip2 \- a block-sorting file compressor, v1.0
560-.br
561-bzcat \- decompresses files to stdout
562-.br
563-bzip2recover \- recovers data from damaged bzip2 files
564-
565-.SH SYNOPSIS
566-.ll +8
567-.B bzip2
568-.RB [ " \-cdfkqstvzVL123456789 " ]
569-[
570-.I "filenames \&..."
571-]
572-.ll -8
573-.br
574-.B bunzip2
575-.RB [ " \-fkvsVL " ]
576-[
577-.I "filenames \&..."
578-]
579-.br
580-.B bzcat
581-.RB [ " \-s " ]
582-[
583-.I "filenames \&..."
584-]
585-.br
586-.B bzip2recover
587-.I "filename"
588-
589-.SH DESCRIPTION
590-.I bzip2
591-compresses files using the Burrows-Wheeler block sorting
592-text compression algorithm, and Huffman coding. Compression is
593-generally considerably better than that achieved by more conventional
594-LZ77/LZ78-based compressors, and approaches the performance of the PPM
595-family of statistical compressors.
596-
597-The command-line options are deliberately very similar to
598-those of
599-.I GNU gzip,
600-but they are not identical.
601-
602-.I bzip2
603-expects a list of file names to accompany the
604-command-line flags. Each file is replaced by a compressed version of
605-itself, with the name "original_name.bz2".
606-Each compressed file
607-has the same modification date, permissions, and, when possible,
608-ownership as the corresponding original, so that these properties can
609-be correctly restored at decompression time. File name handling is
610-naive in the sense that there is no mechanism for preserving original
611-file names, permissions, ownerships or dates in filesystems which lack
612-these concepts, or have serious file name length restrictions, such as
613-MS-DOS.
614-
615-.I bzip2
616-and
617-.I bunzip2
618-will by default not overwrite existing
619-files. If you want this to happen, specify the \-f flag.
620-
621-If no file names are specified,
622-.I bzip2
623-compresses from standard
624-input to standard output. In this case,
625-.I bzip2
626-will decline to
627-write compressed output to a terminal, as this would be entirely
628-incomprehensible and therefore pointless.
629-
630-.I bunzip2
631-(or
632-.I bzip2 \-d)
633-decompresses all
634-specified files. Files which were not created by
635-.I bzip2
636-will be detected and ignored, and a warning issued.
637-.I bzip2
638-attempts to guess the filename for the decompressed file
639-from that of the compressed file as follows:
640-
641- filename.bz2 becomes filename
642- filename.bz becomes filename
643- filename.tbz2 becomes filename.tar
644- filename.tbz becomes filename.tar
645- anyothername becomes anyothername.out
646-
647-If the file does not end in one of the recognised endings,
648-.I .bz2,
649-.I .bz,
650-.I .tbz2
651-or
652-.I .tbz,
653-.I bzip2
654-complains that it cannot
655-guess the name of the original file, and uses the original name
656-with
657-.I .out
658-appended.
659-
660-As with compression, supplying no
661-filenames causes decompression from
662-standard input to standard output.
663-
664-.I bunzip2
665-will correctly decompress a file which is the
666-concatenation of two or more compressed files. The result is the
667-concatenation of the corresponding uncompressed files. Integrity
668-testing (\-t)
669-of concatenated
670-compressed files is also supported.
671-
672-You can also compress or decompress files to the standard output by
673-giving the \-c flag. Multiple files may be compressed and
674-decompressed like this. The resulting outputs are fed sequentially to
675-stdout. Compression of multiple files
676-in this manner generates a stream
677-containing multiple compressed file representations. Such a stream
678-can be decompressed correctly only by
679-.I bzip2
680-version 0.9.0 or
681-later. Earlier versions of
682-.I bzip2
683-will stop after decompressing
684-the first file in the stream.
685-
686-.I bzcat
687-(or
688-.I bzip2 -dc)
689-decompresses all specified files to
690-the standard output.
691-
692-.I bzip2
693-will read arguments from the environment variables
694-.I BZIP2
695-and
696-.I BZIP,
697-in that order, and will process them
698-before any arguments read from the command line. This gives a
699-convenient way to supply default arguments.
700-
701-Compression is always performed, even if the compressed
702-file is slightly
703-larger than the original. Files of less than about one hundred bytes
704-tend to get larger, since the compression mechanism has a constant
705-overhead in the region of 50 bytes. Random data (including the output
706-of most file compressors) is coded at about 8.05 bits per byte, giving
707-an expansion of around 0.5%.
708-
709-As a self-check for your protection,
710-.I
711-bzip2
712-uses 32-bit CRCs to
713-make sure that the decompressed version of a file is identical to the
714-original. This guards against corruption of the compressed data, and
715-against undetected bugs in
716-.I bzip2
717-(hopefully very unlikely). The
718-chances of data corruption going undetected is microscopic, about one
719-chance in four billion for each file processed. Be aware, though, that
720-the check occurs upon decompression, so it can only tell you that
721-something is wrong. It can't help you
722-recover the original uncompressed
723-data. You can use
724-.I bzip2recover
725-to try to recover data from
726-damaged files.
727-
728-Return values: 0 for a normal exit, 1 for environmental problems (file
729-not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
730-compressed file, 3 for an internal consistency error (eg, bug) which
731-caused
732-.I bzip2
733-to panic.
734-
735-.SH OPTIONS
736-.TP
737-.B \-c --stdout
738-Compress or decompress to standard output.
739-.TP
740-.B \-d --decompress
741-Force decompression.
742-.I bzip2,
743-.I bunzip2
744-and
745-.I bzcat
746-are
747-really the same program, and the decision about what actions to take is
748-done on the basis of which name is used. This flag overrides that
749-mechanism, and forces
750-.I bzip2
751-to decompress.
752-.TP
753-.B \-z --compress
754-The complement to \-d: forces compression, regardless of the
755-invokation name.
756-.TP
757-.B \-t --test
758-Check integrity of the specified file(s), but don't decompress them.
759-This really performs a trial decompression and throws away the result.
760-.TP
761-.B \-f --force
762-Force overwrite of output files. Normally,
763-.I bzip2
764-will not overwrite
765-existing output files. Also forces
766-.I bzip2
767-to break hard links
768-to files, which it otherwise wouldn't do.
769-.TP
770-.B \-k --keep
771-Keep (don't delete) input files during compression
772-or decompression.
773-.TP
774-.B \-s --small
775-Reduce memory usage, for compression, decompression and testing. Files
776-are decompressed and tested using a modified algorithm which only
777-requires 2.5 bytes per block byte. This means any file can be
778-decompressed in 2300k of memory, albeit at about half the normal speed.
779-
780-During compression, \-s selects a block size of 200k, which limits
781-memory use to around the same figure, at the expense of your compression
782-ratio. In short, if your machine is low on memory (8 megabytes or
783-less), use \-s for everything. See MEMORY MANAGEMENT below.
784-.TP
785-.B \-q --quiet
786-Suppress non-essential warning messages. Messages pertaining to
787-I/O errors and other critical events will not be suppressed.
788-.TP
789-.B \-v --verbose
790-Verbose mode -- show the compression ratio for each file processed.
791-Further \-v's increase the verbosity level, spewing out lots of
792-information which is primarily of interest for diagnostic purposes.
793-.TP
794-.B \-L --license -V --version
795-Display the software version, license terms and conditions.
796-.TP
797-.B \-1 to \-9
798-Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
799-effect when decompressing. See MEMORY MANAGEMENT below.
800-.TP
801-.B \--
802-Treats all subsequent arguments as file names, even if they start
803-with a dash. This is so you can handle files with names beginning
804-with a dash, for example: bzip2 \-- \-myfilename.
805-.TP
806-.B \--repetitive-fast --repetitive-best
807-These flags are redundant in versions 0.9.5 and above. They provided
808-some coarse control over the behaviour of the sorting algorithm in
809-earlier versions, which was sometimes useful. 0.9.5 and above have an
810-improved algorithm which renders these flags irrelevant.
811-
812-.SH MEMORY MANAGEMENT
813-.I bzip2
814-compresses large files in blocks. The block size affects
815-both the compression ratio achieved, and the amount of memory needed for
816-compression and decompression. The flags \-1 through \-9
817-specify the block size to be 100,000 bytes through 900,000 bytes (the
818-default) respectively. At decompression time, the block size used for
819-compression is read from the header of the compressed file, and
820-.I bunzip2
821-then allocates itself just enough memory to decompress
822-the file. Since block sizes are stored in compressed files, it follows
823-that the flags \-1 to \-9 are irrelevant to and so ignored
824-during decompression.
825-
826-Compression and decompression requirements,
827-in bytes, can be estimated as:
828-
829- Compression: 400k + ( 8 x block size )
830-
831- Decompression: 100k + ( 4 x block size ), or
832- 100k + ( 2.5 x block size )
833-
834-Larger block sizes give rapidly diminishing marginal returns. Most of
835-the compression comes from the first two or three hundred k of block
836-size, a fact worth bearing in mind when using
837-.I bzip2
838-on small machines.
839-It is also important to appreciate that the decompression memory
840-requirement is set at compression time by the choice of block size.
841-
842-For files compressed with the default 900k block size,
843-.I bunzip2
844-will require about 3700 kbytes to decompress. To support decompression
845-of any file on a 4 megabyte machine,
846-.I bunzip2
847-has an option to
848-decompress using approximately half this amount of memory, about 2300
849-kbytes. Decompression speed is also halved, so you should use this
850-option only where necessary. The relevant flag is -s.
851-
852-In general, try and use the largest block size memory constraints allow,
853-since that maximises the compression achieved. Compression and
854-decompression speed are virtually unaffected by block size.
855-
856-Another significant point applies to files which fit in a single block
857--- that means most files you'd encounter using a large block size. The
858-amount of real memory touched is proportional to the size of the file,
859-since the file is smaller than a block. For example, compressing a file
860-20,000 bytes long with the flag -9 will cause the compressor to
861-allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
862-kbytes of it. Similarly, the decompressor will allocate 3700k but only
863-touch 100k + 20000 * 4 = 180 kbytes.
864-
865-Here is a table which summarises the maximum memory usage for different
866-block sizes. Also recorded is the total compressed size for 14 files of
867-the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
868-column gives some feel for how compression varies with block size.
869-These figures tend to understate the advantage of larger block sizes for
870-larger files, since the Corpus is dominated by smaller files.
871-
872- Compress Decompress Decompress Corpus
873- Flag usage usage -s usage Size
874-
875- -1 1200k 500k 350k 914704
876- -2 2000k 900k 600k 877703
877- -3 2800k 1300k 850k 860338
878- -4 3600k 1700k 1100k 846899
879- -5 4400k 2100k 1350k 845160
880- -6 5200k 2500k 1600k 838626
881- -7 6100k 2900k 1850k 834096
882- -8 6800k 3300k 2100k 828642
883- -9 7600k 3700k 2350k 828642
884-
885-.SH RECOVERING DATA FROM DAMAGED FILES
886-.I bzip2
887-compresses files in blocks, usually 900kbytes long. Each
888-block is handled independently. If a media or transmission error causes
889-a multi-block .bz2
890-file to become damaged, it may be possible to
891-recover data from the undamaged blocks in the file.
892-
893-The compressed representation of each block is delimited by a 48-bit
894-pattern, which makes it possible to find the block boundaries with
895-reasonable certainty. Each block also carries its own 32-bit CRC, so
896-damaged blocks can be distinguished from undamaged ones.
897-
898-.I bzip2recover
899-is a simple program whose purpose is to search for
900-blocks in .bz2 files, and write each block out into its own .bz2
901-file. You can then use
902-.I bzip2
903-\-t
904-to test the
905-integrity of the resulting files, and decompress those which are
906-undamaged.
907-
908-.I bzip2recover
909-takes a single argument, the name of the damaged file,
910-and writes a number of files "rec0001file.bz2",
911-"rec0002file.bz2", etc, containing the extracted blocks.
912-The output filenames are designed so that the use of
913-wildcards in subsequent processing -- for example,
914-"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
915-the correct order.
916-
917-.I bzip2recover
918-should be of most use dealing with large .bz2
919-files, as these will contain many blocks. It is clearly
920-futile to use it on damaged single-block files, since a
921-damaged block cannot be recovered. If you wish to minimise
922-any potential data loss through media or transmission errors,
923-you might consider compressing with a smaller
924-block size.
925-
926-.SH PERFORMANCE NOTES
927-The sorting phase of compression gathers together similar strings in the
928-file. Because of this, files containing very long runs of repeated
929-symbols, like "aabaabaabaab ..." (repeated several hundred times) may
930-compress more slowly than normal. Versions 0.9.5 and above fare much
931-better than previous versions in this respect. The ratio between
932-worst-case and average-case compression time is in the region of 10:1.
933-For previous versions, this figure was more like 100:1. You can use the
934-\-vvvv option to monitor progress in great detail, if you want.
935-
936-Decompression speed is unaffected by these phenomena.
937-
938-.I bzip2
939-usually allocates several megabytes of memory to operate
940-in, and then charges all over it in a fairly random fashion. This means
941-that performance, both for compressing and decompressing, is largely
942-determined by the speed at which your machine can service cache misses.
943-Because of this, small changes to the code to reduce the miss rate have
944-been observed to give disproportionately large performance improvements.
945-I imagine
946-.I bzip2
947-will perform best on machines with very large caches.
948-
949-.SH CAVEATS
950-I/O error messages are not as helpful as they could be.
951-.I bzip2
952-tries hard to detect I/O errors and exit cleanly, but the details of
953-what the problem is sometimes seem rather misleading.
954-
955-This manual page pertains to version 1.0 of
956-.I bzip2.
957-Compressed
958-data created by this version is entirely forwards and backwards
959-compatible with the previous public releases, versions 0.1pl2, 0.9.0
960-and 0.9.5,
961-but with the following exception: 0.9.0 and above can correctly
962-decompress multiple concatenated compressed files. 0.1pl2 cannot do
963-this; it will stop after decompressing just the first file in the
964-stream.
965-
966-.I bzip2recover
967-uses 32-bit integers to represent bit positions in
968-compressed files, so it cannot handle compressed files more than 512
969-megabytes long. This could easily be fixed.
970-
971-.SH AUTHOR
972-Julian Seward, jseward@acm.org.
973-
974-http://sourceware.cygnus.com/bzip2
975-http://www.muraroa.demon.co.uk
976-
977-The ideas embodied in
978-.I bzip2
979-are due to (at least) the following
980-people: Michael Burrows and David Wheeler (for the block sorting
981-transformation), David Wheeler (again, for the Huffman coder), Peter
982-Fenwick (for the structured coding model in the original
983-.I bzip,
984-and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
985-(for the arithmetic coder in the original
986-.I bzip).
987-I am much
988-indebted for their help, support and advice. See the manual in the
989-source distribution for pointers to sources of documentation. Christian
990-von Roques encouraged me to look for faster sorting algorithms, so as to
991-speed up compression. Bela Lubkin encouraged me to improve the
992-worst-case compression performance. Many people sent patches, helped
993-with portability problems, lent machines, gave advice and were generally
994-helpful.
995diff -Nru bzip2-1.0.1/bzip2.1.preformatted bzip2-1.0.1.new/bzip2.1.preformatted
996--- bzip2-1.0.1/bzip2.1.preformatted Sat Jun 24 20:13:27 2000
997+++ bzip2-1.0.1.new/bzip2.1.preformatted Thu Jan 1 01:00:00 1970
998@@ -1,462 +0,0 @@
999-
1000-
1001-
1002-bzip2(1) bzip2(1)
1003-
1004-
1005-N\bNA\bAM\bME\bE
1006- bzip2, bunzip2 - a block-sorting file compressor, v1.0
1007- bzcat - decompresses files to stdout
1008- bzip2recover - recovers data from damaged bzip2 files
1009-
1010-
1011-S\bSY\bYN\bNO\bOP\bPS\bSI\bIS\bS
1012- b\bbz\bzi\bip\bp2\b2 [ -\b-c\bcd\bdf\bfk\bkq\bqs\bst\btv\bvz\bzV\bVL\bL1\b12\b23\b34\b45\b56\b67\b78\b89\b9 ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ]
1013- b\bbu\bun\bnz\bzi\bip\bp2\b2 [ -\b-f\bfk\bkv\bvs\bsV\bVL\bL ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ]
1014- b\bbz\bzc\bca\bat\bt [ -\b-s\bs ] [ _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be_\bs _\b._\b._\b. ]
1015- b\bbz\bzi\bip\bp2\b2r\bre\bec\bco\bov\bve\ber\br _\bf_\bi_\bl_\be_\bn_\ba_\bm_\be
1016-
1017-
1018-D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
1019- _\bb_\bz_\bi_\bp_\b2 compresses files using the Burrows-Wheeler block
1020- sorting text compression algorithm, and Huffman coding.
1021- Compression is generally considerably better than that
1022- achieved by more conventional LZ77/LZ78-based compressors,
1023- and approaches the performance of the PPM family of sta-
1024- tistical compressors.
1025-
1026- The command-line options are deliberately very similar to
1027- those of _\bG_\bN_\bU _\bg_\bz_\bi_\bp_\b, but they are not identical.
1028-
1029- _\bb_\bz_\bi_\bp_\b2 expects a list of file names to accompany the com-
1030- mand-line flags. Each file is replaced by a compressed
1031- version of itself, with the name "original_name.bz2".
1032- Each compressed file has the same modification date, per-
1033- missions, and, when possible, ownership as the correspond-
1034- ing original, so that these properties can be correctly
1035- restored at decompression time. File name handling is
1036- naive in the sense that there is no mechanism for preserv-
1037- ing original file names, permissions, ownerships or dates
1038- in filesystems which lack these concepts, or have serious
1039- file name length restrictions, such as MS-DOS.
1040-
1041- _\bb_\bz_\bi_\bp_\b2 and _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will by default not overwrite existing
1042- files. If you want this to happen, specify the -f flag.
1043-
1044- If no file names are specified, _\bb_\bz_\bi_\bp_\b2 compresses from
1045- standard input to standard output. In this case, _\bb_\bz_\bi_\bp_\b2
1046- will decline to write compressed output to a terminal, as
1047- this would be entirely incomprehensible and therefore
1048- pointless.
1049-
1050- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 (or _\bb_\bz_\bi_\bp_\b2 _\b-_\bd_\b) decompresses all specified files.
1051- Files which were not created by _\bb_\bz_\bi_\bp_\b2 will be detected and
1052- ignored, and a warning issued. _\bb_\bz_\bi_\bp_\b2 attempts to guess
1053- the filename for the decompressed file from that of the
1054- compressed file as follows:
1055-
1056- filename.bz2 becomes filename
1057- filename.bz becomes filename
1058- filename.tbz2 becomes filename.tar
1059-
1060-
1061-
1062- 1
1063-
1064-
1065-
1066-
1067-
1068-bzip2(1) bzip2(1)
1069-
1070-
1071- filename.tbz becomes filename.tar
1072- anyothername becomes anyothername.out
1073-
1074- If the file does not end in one of the recognised endings,
1075- _\b._\bb_\bz_\b2_\b, _\b._\bb_\bz_\b, _\b._\bt_\bb_\bz_\b2 or _\b._\bt_\bb_\bz_\b, _\bb_\bz_\bi_\bp_\b2 complains that it cannot
1076- guess the name of the original file, and uses the original
1077- name with _\b._\bo_\bu_\bt appended.
1078-
1079- As with compression, supplying no filenames causes decom-
1080- pression from standard input to standard output.
1081-
1082- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will correctly decompress a file which is the con-
1083- catenation of two or more compressed files. The result is
1084- the concatenation of the corresponding uncompressed files.
1085- Integrity testing (-t) of concatenated compressed files is
1086- also supported.
1087-
1088- You can also compress or decompress files to the standard
1089- output by giving the -c flag. Multiple files may be com-
1090- pressed and decompressed like this. The resulting outputs
1091- are fed sequentially to stdout. Compression of multiple
1092- files in this manner generates a stream containing multi-
1093- ple compressed file representations. Such a stream can be
1094- decompressed correctly only by _\bb_\bz_\bi_\bp_\b2 version 0.9.0 or
1095- later. Earlier versions of _\bb_\bz_\bi_\bp_\b2 will stop after decom-
1096- pressing the first file in the stream.
1097-
1098- _\bb_\bz_\bc_\ba_\bt (or _\bb_\bz_\bi_\bp_\b2 _\b-_\bd_\bc_\b) decompresses all specified files to
1099- the standard output.
1100-
1101- _\bb_\bz_\bi_\bp_\b2 will read arguments from the environment variables
1102- _\bB_\bZ_\bI_\bP_\b2 and _\bB_\bZ_\bI_\bP_\b, in that order, and will process them
1103- before any arguments read from the command line. This
1104- gives a convenient way to supply default arguments.
1105-
1106- Compression is always performed, even if the compressed
1107- file is slightly larger than the original. Files of less
1108- than about one hundred bytes tend to get larger, since the
1109- compression mechanism has a constant overhead in the
1110- region of 50 bytes. Random data (including the output of
1111- most file compressors) is coded at about 8.05 bits per
1112- byte, giving an expansion of around 0.5%.
1113-
1114- As a self-check for your protection, _\bb_\bz_\bi_\bp_\b2 uses 32-bit
1115- CRCs to make sure that the decompressed version of a file
1116- is identical to the original. This guards against corrup-
1117- tion of the compressed data, and against undetected bugs
1118- in _\bb_\bz_\bi_\bp_\b2 (hopefully very unlikely). The chances of data
1119- corruption going undetected is microscopic, about one
1120- chance in four billion for each file processed. Be aware,
1121- though, that the check occurs upon decompression, so it
1122- can only tell you that something is wrong. It can't help
1123- you recover the original uncompressed data. You can use
1124- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br to try to recover data from damaged files.
1125-
1126-
1127-
1128- 2
1129-
1130-
1131-
1132-
1133-
1134-bzip2(1) bzip2(1)
1135-
1136-
1137- Return values: 0 for a normal exit, 1 for environmental
1138- problems (file not found, invalid flags, I/O errors, &c),
1139- 2 to indicate a corrupt compressed file, 3 for an internal
1140- consistency error (eg, bug) which caused _\bb_\bz_\bi_\bp_\b2 to panic.
1141-
1142-
1143-O\bOP\bPT\bTI\bIO\bON\bNS\bS
1144- -\b-c\bc -\b--\b-s\bst\btd\bdo\bou\but\bt
1145- Compress or decompress to standard output.
1146-
1147- -\b-d\bd -\b--\b-d\bde\bec\bco\bom\bmp\bpr\bre\bes\bss\bs
1148- Force decompression. _\bb_\bz_\bi_\bp_\b2_\b, _\bb_\bu_\bn_\bz_\bi_\bp_\b2 and _\bb_\bz_\bc_\ba_\bt are
1149- really the same program, and the decision about
1150- what actions to take is done on the basis of which
1151- name is used. This flag overrides that mechanism,
1152- and forces _\bb_\bz_\bi_\bp_\b2 to decompress.
1153-
1154- -\b-z\bz -\b--\b-c\bco\bom\bmp\bpr\bre\bes\bss\bs
1155- The complement to -d: forces compression, regard-
1156- less of the invokation name.
1157-
1158- -\b-t\bt -\b--\b-t\bte\bes\bst\bt
1159- Check integrity of the specified file(s), but don't
1160- decompress them. This really performs a trial
1161- decompression and throws away the result.
1162-
1163- -\b-f\bf -\b--\b-f\bfo\bor\brc\bce\be
1164- Force overwrite of output files. Normally, _\bb_\bz_\bi_\bp_\b2
1165- will not overwrite existing output files. Also
1166- forces _\bb_\bz_\bi_\bp_\b2 to break hard links to files, which it
1167- otherwise wouldn't do.
1168-
1169- -\b-k\bk -\b--\b-k\bke\bee\bep\bp
1170- Keep (don't delete) input files during compression
1171- or decompression.
1172-
1173- -\b-s\bs -\b--\b-s\bsm\bma\bal\bll\bl
1174- Reduce memory usage, for compression, decompression
1175- and testing. Files are decompressed and tested
1176- using a modified algorithm which only requires 2.5
1177- bytes per block byte. This means any file can be
1178- decompressed in 2300k of memory, albeit at about
1179- half the normal speed.
1180-
1181- During compression, -s selects a block size of
1182- 200k, which limits memory use to around the same
1183- figure, at the expense of your compression ratio.
1184- In short, if your machine is low on memory (8
1185- megabytes or less), use -s for everything. See
1186- MEMORY MANAGEMENT below.
1187-
1188- -\b-q\bq -\b--\b-q\bqu\bui\bie\bet\bt
1189- Suppress non-essential warning messages. Messages
1190- pertaining to I/O errors and other critical events
1191-
1192-
1193-
1194- 3
1195-
1196-
1197-
1198-
1199-
1200-bzip2(1) bzip2(1)
1201-
1202-
1203- will not be suppressed.
1204-
1205- -\b-v\bv -\b--\b-v\bve\ber\brb\bbo\bos\bse\be
1206- Verbose mode -- show the compression ratio for each
1207- file processed. Further -v's increase the ver-
1208- bosity level, spewing out lots of information which
1209- is primarily of interest for diagnostic purposes.
1210-
1211- -\b-L\bL -\b--\b-l\bli\bic\bce\ben\bns\bse\be -\b-V\bV -\b--\b-v\bve\ber\brs\bsi\bio\bon\bn
1212- Display the software version, license terms and
1213- conditions.
1214-
1215- -\b-1\b1 t\bto\bo -\b-9\b9
1216- Set the block size to 100 k, 200 k .. 900 k when
1217- compressing. Has no effect when decompressing.
1218- See MEMORY MANAGEMENT below.
1219-
1220- -\b--\b- Treats all subsequent arguments as file names, even
1221- if they start with a dash. This is so you can han-
1222- dle files with names beginning with a dash, for
1223- example: bzip2 -- -myfilename.
1224-
1225- -\b--\b-r\bre\bep\bpe\bet\bti\bit\bti\biv\bve\be-\b-f\bfa\bas\bst\bt -\b--\b-r\bre\bep\bpe\bet\bti\bit\bti\biv\bve\be-\b-b\bbe\bes\bst\bt
1226- These flags are redundant in versions 0.9.5 and
1227- above. They provided some coarse control over the
1228- behaviour of the sorting algorithm in earlier ver-
1229- sions, which was sometimes useful. 0.9.5 and above
1230- have an improved algorithm which renders these
1231- flags irrelevant.
1232-
1233-
1234-M\bME\bEM\bMO\bOR\bRY\bY M\bMA\bAN\bNA\bAG\bGE\bEM\bME\bEN\bNT\bT
1235- _\bb_\bz_\bi_\bp_\b2 compresses large files in blocks. The block size
1236- affects both the compression ratio achieved, and the
1237- amount of memory needed for compression and decompression.
1238- The flags -1 through -9 specify the block size to be
1239- 100,000 bytes through 900,000 bytes (the default) respec-
1240- tively. At decompression time, the block size used for
1241- compression is read from the header of the compressed
1242- file, and _\bb_\bu_\bn_\bz_\bi_\bp_\b2 then allocates itself just enough memory
1243- to decompress the file. Since block sizes are stored in
1244- compressed files, it follows that the flags -1 to -9 are
1245- irrelevant to and so ignored during decompression.
1246-
1247- Compression and decompression requirements, in bytes, can
1248- be estimated as:
1249-
1250- Compression: 400k + ( 8 x block size )
1251-
1252- Decompression: 100k + ( 4 x block size ), or
1253- 100k + ( 2.5 x block size )
1254-
1255- Larger block sizes give rapidly diminishing marginal
1256- returns. Most of the compression comes from the first two
1257-
1258-
1259-
1260- 4
1261-
1262-
1263-
1264-
1265-
1266-bzip2(1) bzip2(1)
1267-
1268-
1269- or three hundred k of block size, a fact worth bearing in
1270- mind when using _\bb_\bz_\bi_\bp_\b2 on small machines. It is also
1271- important to appreciate that the decompression memory
1272- requirement is set at compression time by the choice of
1273- block size.
1274-
1275- For files compressed with the default 900k block size,
1276- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 will require about 3700 kbytes to decompress. To
1277- support decompression of any file on a 4 megabyte machine,
1278- _\bb_\bu_\bn_\bz_\bi_\bp_\b2 has an option to decompress using approximately
1279- half this amount of memory, about 2300 kbytes. Decompres-
1280- sion speed is also halved, so you should use this option
1281- only where necessary. The relevant flag is -s.
1282-
1283- In general, try and use the largest block size memory con-
1284- straints allow, since that maximises the compression
1285- achieved. Compression and decompression speed are virtu-
1286- ally unaffected by block size.
1287-
1288- Another significant point applies to files which fit in a
1289- single block -- that means most files you'd encounter
1290- using a large block size. The amount of real memory
1291- touched is proportional to the size of the file, since the
1292- file is smaller than a block. For example, compressing a
1293- file 20,000 bytes long with the flag -9 will cause the
1294- compressor to allocate around 7600k of memory, but only
1295- touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
1296- decompressor will allocate 3700k but only touch 100k +
1297- 20000 * 4 = 180 kbytes.
1298-
1299- Here is a table which summarises the maximum memory usage
1300- for different block sizes. Also recorded is the total
1301- compressed size for 14 files of the Calgary Text Compres-
1302- sion Corpus totalling 3,141,622 bytes. This column gives
1303- some feel for how compression varies with block size.
1304- These figures tend to understate the advantage of larger
1305- block sizes for larger files, since the Corpus is domi-
1306- nated by smaller files.
1307-
1308- Compress Decompress Decompress Corpus
1309- Flag usage usage -s usage Size
1310-
1311- -1 1200k 500k 350k 914704
1312- -2 2000k 900k 600k 877703
1313- -3 2800k 1300k 850k 860338
1314- -4 3600k 1700k 1100k 846899
1315- -5 4400k 2100k 1350k 845160
1316- -6 5200k 2500k 1600k 838626
1317- -7 6100k 2900k 1850k 834096
1318- -8 6800k 3300k 2100k 828642
1319- -9 7600k 3700k 2350k 828642
1320-
1321-
1322-
1323-
1324-
1325-
1326- 5
1327-
1328-
1329-
1330-
1331-
1332-bzip2(1) bzip2(1)
1333-
1334-
1335-R\bRE\bEC\bCO\bOV\bVE\bER\bRI\bIN\bNG\bG D\bDA\bAT\bTA\bA F\bFR\bRO\bOM\bM D\bDA\bAM\bMA\bAG\bGE\bED\bD F\bFI\bIL\bLE\bES\bS
1336- _\bb_\bz_\bi_\bp_\b2 compresses files in blocks, usually 900kbytes long.
1337- Each block is handled independently. If a media or trans-
1338- mission error causes a multi-block .bz2 file to become
1339- damaged, it may be possible to recover data from the
1340- undamaged blocks in the file.
1341-
1342- The compressed representation of each block is delimited
1343- by a 48-bit pattern, which makes it possible to find the
1344- block boundaries with reasonable certainty. Each block
1345- also carries its own 32-bit CRC, so damaged blocks can be
1346- distinguished from undamaged ones.
1347-
1348- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br is a simple program whose purpose is to
1349- search for blocks in .bz2 files, and write each block out
1350- into its own .bz2 file. You can then use _\bb_\bz_\bi_\bp_\b2 -t to test
1351- the integrity of the resulting files, and decompress those
1352- which are undamaged.
1353-
1354- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br takes a single argument, the name of the dam-
1355- aged file, and writes a number of files "rec0001file.bz2",
1356- "rec0002file.bz2", etc, containing the extracted blocks.
1357- The output filenames are designed so that the use of
1358- wildcards in subsequent processing -- for example, "bzip2
1359- -dc rec*file.bz2 > recovered_data" -- lists the files in
1360- the correct order.
1361-
1362- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br should be of most use dealing with large .bz2
1363- files, as these will contain many blocks. It is clearly
1364- futile to use it on damaged single-block files, since a
1365- damaged block cannot be recovered. If you wish to min-
1366- imise any potential data loss through media or transmis-
1367- sion errors, you might consider compressing with a smaller
1368- block size.
1369-
1370-
1371-P\bPE\bER\bRF\bFO\bOR\bRM\bMA\bAN\bNC\bCE\bE N\bNO\bOT\bTE\bES\bS
1372- The sorting phase of compression gathers together similar
1373- strings in the file. Because of this, files containing
1374- very long runs of repeated symbols, like "aabaabaabaab
1375- ..." (repeated several hundred times) may compress more
1376- slowly than normal. Versions 0.9.5 and above fare much
1377- better than previous versions in this respect. The ratio
1378- between worst-case and average-case compression time is in
1379- the region of 10:1. For previous versions, this figure
1380- was more like 100:1. You can use the -vvvv option to mon-
1381- itor progress in great detail, if you want.
1382-
1383- Decompression speed is unaffected by these phenomena.
1384-
1385- _\bb_\bz_\bi_\bp_\b2 usually allocates several megabytes of memory to
1386- operate in, and then charges all over it in a fairly ran-
1387- dom fashion. This means that performance, both for com-
1388- pressing and decompressing, is largely determined by the
1389-
1390-
1391-
1392- 6
1393-
1394-
1395-
1396-
1397-
1398-bzip2(1) bzip2(1)
1399-
1400-
1401- speed at which your machine can service cache misses.
1402- Because of this, small changes to the code to reduce the
1403- miss rate have been observed to give disproportionately
1404- large performance improvements. I imagine _\bb_\bz_\bi_\bp_\b2 will per-
1405- form best on machines with very large caches.
1406-
1407-
1408-C\bCA\bAV\bVE\bEA\bAT\bTS\bS
1409- I/O error messages are not as helpful as they could be.
1410- _\bb_\bz_\bi_\bp_\b2 tries hard to detect I/O errors and exit cleanly,
1411- but the details of what the problem is sometimes seem
1412- rather misleading.
1413-
1414- This manual page pertains to version 1.0 of _\bb_\bz_\bi_\bp_\b2_\b. Com-
1415- pressed data created by this version is entirely forwards
1416- and backwards compatible with the previous public
1417- releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
1418- following exception: 0.9.0 and above can correctly decom-
1419- press multiple concatenated compressed files. 0.1pl2 can-
1420- not do this; it will stop after decompressing just the
1421- first file in the stream.
1422-
1423- _\bb_\bz_\bi_\bp_\b2_\br_\be_\bc_\bo_\bv_\be_\br uses 32-bit integers to represent bit posi-
1424- tions in compressed files, so it cannot handle compressed
1425- files more than 512 megabytes long. This could easily be
1426- fixed.
1427-
1428-
1429-A\bAU\bUT\bTH\bHO\bOR\bR
1430- Julian Seward, jseward@acm.org.
1431-
1432- http://sourceware.cygnus.com/bzip2
1433- http://www.muraroa.demon.co.uk
1434-
1435- The ideas embodied in _\bb_\bz_\bi_\bp_\b2 are due to (at least) the fol-
1436- lowing people: Michael Burrows and David Wheeler (for the
1437- block sorting transformation), David Wheeler (again, for
1438- the Huffman coder), Peter Fenwick (for the structured cod-
1439- ing model in the original _\bb_\bz_\bi_\bp_\b, and many refinements), and
1440- Alistair Moffat, Radford Neal and Ian Witten (for the
1441- arithmetic coder in the original _\bb_\bz_\bi_\bp_\b)_\b. I am much
1442- indebted for their help, support and advice. See the man-
1443- ual in the source distribution for pointers to sources of
1444- documentation. Christian von Roques encouraged me to look
1445- for faster sorting algorithms, so as to speed up compres-
1446- sion. Bela Lubkin encouraged me to improve the worst-case
1447- compression performance. Many people sent patches, helped
1448- with portability problems, lent machines, gave advice and
1449- were generally helpful.
1450-
1451-
1452-
1453-
1454-
1455-
1456-
1457-
1458- 7
1459-
1460-
1461diff -Nru bzip2-1.0.1/bzless bzip2-1.0.1.new/bzless
1462--- bzip2-1.0.1/bzless Thu Jan 1 01:00:00 1970
1463+++ bzip2-1.0.1.new/bzless Sat Jun 24 20:16:09 2000
1464@@ -0,0 +1,2 @@
1465+#!/bin/sh
906ef59a 1466+%{_bindir}/bunzip2 -c "$@" | %{_bindir}/less
d967e3ec 1467diff -Nru bzip2-1.0.1/config.h.in bzip2-1.0.1.new/config.h.in
1468--- bzip2-1.0.1/config.h.in Thu Jan 1 01:00:00 1970
1469+++ bzip2-1.0.1.new/config.h.in Sat Jun 24 20:13:06 2000
1470@@ -0,0 +1,17 @@
1471+/* config.h.in. Generated automatically from configure.in by autoheader. */
1472+
1473+/* Name of package */
1474+#undef PACKAGE
1475+
1476+/* Version number of package */
1477+#undef VERSION
1478+
1479+/* Number of bits in a file offset, on hosts where this is settable. */
1480+#undef _FILE_OFFSET_BITS
1481+
1482+/* Define to make fseeko etc. visible, on some hosts. */
1483+#undef _LARGEFILE_SOURCE
1484+
1485+/* Define for large files, on AIX-style hosts. */
1486+#undef _LARGE_FILES
1487+
1488diff -Nru bzip2-1.0.1/configure.in bzip2-1.0.1.new/configure.in
1489--- bzip2-1.0.1/configure.in Thu Jan 1 01:00:00 1970
1490+++ bzip2-1.0.1.new/configure.in Sat Jun 24 20:13:06 2000
1491@@ -0,0 +1,10 @@
1492+AC_INIT(bzip2.c)
1493+AM_INIT_AUTOMAKE(bzip2,1.0.1)
1494+AM_CONFIG_HEADER(config.h)
1495+AC_PROG_CC
1496+AM_PROG_LIBTOOL
1497+AC_PROG_LN_S
1498+AC_SYS_LARGEFILE
1499+AC_OUTPUT(Makefile
1500+ doc/Makefile
1501+ doc/pl/Makefile)
1502diff -Nru bzip2-1.0.1/crctable.c bzip2-1.0.1.new/crctable.c
1503--- bzip2-1.0.1/crctable.c Sat Jun 24 20:13:27 2000
1504+++ bzip2-1.0.1.new/crctable.c Sat Jun 24 20:13:06 2000
1505@@ -58,6 +58,10 @@
1506 For more information on these sources, see the manual.
1507 --*/
1508
1509+#ifdef HAVE_CONFIG_H
1510+#include <config.h>
1511+#endif
1512+
1513
1514 #include "bzlib_private.h"
1515
1516diff -Nru bzip2-1.0.1/decompress.c bzip2-1.0.1.new/decompress.c
1517--- bzip2-1.0.1/decompress.c Sat Jun 24 20:13:27 2000
1518+++ bzip2-1.0.1.new/decompress.c Sat Jun 24 20:13:06 2000
1519@@ -58,6 +58,10 @@
1520 For more information on these sources, see the manual.
1521 --*/
1522
1523+#ifdef HAVE_CONFIG_H
1524+#include <config.h>
1525+#endif
1526+
1527
1528 #include "bzlib_private.h"
1529
1530diff -Nru bzip2-1.0.1/dlltest.c bzip2-1.0.1.new/dlltest.c
1531--- bzip2-1.0.1/dlltest.c Sat Jun 24 20:13:27 2000
1532+++ bzip2-1.0.1.new/dlltest.c Sat Jun 24 20:13:06 2000
1533@@ -8,6 +8,10 @@
1534 usage: minibz2 [-d] [-{1,2,..9}] [[srcfilename] destfilename]\r
1535 */\r
1536 \r
1537+#ifdef HAVE_CONFIG_H
1538+#include <config.h>
1539+#endif
1540+
1541 #define BZ_IMPORT\r
1542 #include <stdio.h>\r
1543 #include <stdlib.h>\r
1544diff -Nru bzip2-1.0.1/doc/Makefile.am bzip2-1.0.1.new/doc/Makefile.am
1545--- bzip2-1.0.1/doc/Makefile.am Thu Jan 1 01:00:00 1970
1546+++ bzip2-1.0.1.new/doc/Makefile.am Sat Jun 24 20:14:43 2000
1547@@ -0,0 +1,5 @@
1548+
1549+SUBDIRS = pl
1550+
1551+man_MANS = bzip2.1 bunzip2.1 bzcat.1 bzip2recover.1
1552+#info_TEXINFOS = bzip2.texi
1553diff -Nru bzip2-1.0.1/doc/bunzip2.1 bzip2-1.0.1.new/doc/bunzip2.1
1554--- bzip2-1.0.1/doc/bunzip2.1 Thu Jan 1 01:00:00 1970
1555+++ bzip2-1.0.1.new/doc/bunzip2.1 Sat Jun 24 20:13:06 2000
1556@@ -0,0 +1 @@
1557+.so bzip2.1
1558\ No newline at end of file
1559diff -Nru bzip2-1.0.1/doc/bzcat.1 bzip2-1.0.1.new/doc/bzcat.1
1560--- bzip2-1.0.1/doc/bzcat.1 Thu Jan 1 01:00:00 1970
1561+++ bzip2-1.0.1.new/doc/bzcat.1 Sat Jun 24 20:13:06 2000
1562@@ -0,0 +1 @@
1563+.so bzip2.1
1564\ No newline at end of file
1565diff -Nru bzip2-1.0.1/doc/bzip2.1 bzip2-1.0.1.new/doc/bzip2.1
1566--- bzip2-1.0.1/doc/bzip2.1 Thu Jan 1 01:00:00 1970
1567+++ bzip2-1.0.1.new/doc/bzip2.1 Sat Jun 24 20:13:06 2000
1568@@ -0,0 +1,439 @@
1569+.PU
1570+.TH bzip2 1
1571+.SH NAME
1572+bzip2, bunzip2 \- a block-sorting file compressor, v1.0
1573+.br
1574+bzcat \- decompresses files to stdout
1575+.br
1576+bzip2recover \- recovers data from damaged bzip2 files
1577+
1578+.SH SYNOPSIS
1579+.ll +8
1580+.B bzip2
1581+.RB [ " \-cdfkqstvzVL123456789 " ]
1582+[
1583+.I "filenames \&..."
1584+]
1585+.ll -8
1586+.br
1587+.B bunzip2
1588+.RB [ " \-fkvsVL " ]
1589+[
1590+.I "filenames \&..."
1591+]
1592+.br
1593+.B bzcat
1594+.RB [ " \-s " ]
1595+[
1596+.I "filenames \&..."
1597+]
1598+.br
1599+.B bzip2recover
1600+.I "filename"
1601+
1602+.SH DESCRIPTION
1603+.I bzip2
1604+compresses files using the Burrows-Wheeler block sorting
1605+text compression algorithm, and Huffman coding. Compression is
1606+generally considerably better than that achieved by more conventional
1607+LZ77/LZ78-based compressors, and approaches the performance of the PPM
1608+family of statistical compressors.
1609+
1610+The command-line options are deliberately very similar to
1611+those of
1612+.I GNU gzip,
1613+but they are not identical.
1614+
1615+.I bzip2
1616+expects a list of file names to accompany the
1617+command-line flags. Each file is replaced by a compressed version of
1618+itself, with the name "original_name.bz2".
1619+Each compressed file
1620+has the same modification date, permissions, and, when possible,
1621+ownership as the corresponding original, so that these properties can
1622+be correctly restored at decompression time. File name handling is
1623+naive in the sense that there is no mechanism for preserving original
1624+file names, permissions, ownerships or dates in filesystems which lack
1625+these concepts, or have serious file name length restrictions, such as
1626+MS-DOS.
1627+
1628+.I bzip2
1629+and
1630+.I bunzip2
1631+will by default not overwrite existing
1632+files. If you want this to happen, specify the \-f flag.
1633+
1634+If no file names are specified,
1635+.I bzip2
1636+compresses from standard
1637+input to standard output. In this case,
1638+.I bzip2
1639+will decline to
1640+write compressed output to a terminal, as this would be entirely
1641+incomprehensible and therefore pointless.
1642+
1643+.I bunzip2
1644+(or
1645+.I bzip2 \-d)
1646+decompresses all
1647+specified files. Files which were not created by
1648+.I bzip2
1649+will be detected and ignored, and a warning issued.
1650+.I bzip2
1651+attempts to guess the filename for the decompressed file
1652+from that of the compressed file as follows:
1653+
1654+ filename.bz2 becomes filename
1655+ filename.bz becomes filename
1656+ filename.tbz2 becomes filename.tar
1657+ filename.tbz becomes filename.tar
1658+ anyothername becomes anyothername.out
1659+
1660+If the file does not end in one of the recognised endings,
1661+.I .bz2,
1662+.I .bz,
1663+.I .tbz2
1664+or
1665+.I .tbz,
1666+.I bzip2
1667+complains that it cannot
1668+guess the name of the original file, and uses the original name
1669+with
1670+.I .out
1671+appended.
1672+
1673+As with compression, supplying no
1674+filenames causes decompression from
1675+standard input to standard output.
1676+
1677+.I bunzip2
1678+will correctly decompress a file which is the
1679+concatenation of two or more compressed files. The result is the
1680+concatenation of the corresponding uncompressed files. Integrity
1681+testing (\-t)
1682+of concatenated
1683+compressed files is also supported.
1684+
1685+You can also compress or decompress files to the standard output by
1686+giving the \-c flag. Multiple files may be compressed and
1687+decompressed like this. The resulting outputs are fed sequentially to
1688+stdout. Compression of multiple files
1689+in this manner generates a stream
1690+containing multiple compressed file representations. Such a stream
1691+can be decompressed correctly only by
1692+.I bzip2
1693+version 0.9.0 or
1694+later. Earlier versions of
1695+.I bzip2
1696+will stop after decompressing
1697+the first file in the stream.
1698+
1699+.I bzcat
1700+(or
1701+.I bzip2 -dc)
1702+decompresses all specified files to
1703+the standard output.
1704+
1705+.I bzip2
1706+will read arguments from the environment variables
1707+.I BZIP2
1708+and
1709+.I BZIP,
1710+in that order, and will process them
1711+before any arguments read from the command line. This gives a
1712+convenient way to supply default arguments.
1713+
1714+Compression is always performed, even if the compressed
1715+file is slightly
1716+larger than the original. Files of less than about one hundred bytes
1717+tend to get larger, since the compression mechanism has a constant
1718+overhead in the region of 50 bytes. Random data (including the output
1719+of most file compressors) is coded at about 8.05 bits per byte, giving
1720+an expansion of around 0.5%.
1721+
1722+As a self-check for your protection,
1723+.I
1724+bzip2
1725+uses 32-bit CRCs to
1726+make sure that the decompressed version of a file is identical to the
1727+original. This guards against corruption of the compressed data, and
1728+against undetected bugs in
1729+.I bzip2
1730+(hopefully very unlikely). The
1731+chances of data corruption going undetected is microscopic, about one
1732+chance in four billion for each file processed. Be aware, though, that
1733+the check occurs upon decompression, so it can only tell you that
1734+something is wrong. It can't help you
1735+recover the original uncompressed
1736+data. You can use
1737+.I bzip2recover
1738+to try to recover data from
1739+damaged files.
1740+
1741+Return values: 0 for a normal exit, 1 for environmental problems (file
1742+not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
1743+compressed file, 3 for an internal consistency error (eg, bug) which
1744+caused
1745+.I bzip2
1746+to panic.
1747+
1748+.SH OPTIONS
1749+.TP
1750+.B \-c --stdout
1751+Compress or decompress to standard output.
1752+.TP
1753+.B \-d --decompress
1754+Force decompression.
1755+.I bzip2,
1756+.I bunzip2
1757+and
1758+.I bzcat
1759+are
1760+really the same program, and the decision about what actions to take is
1761+done on the basis of which name is used. This flag overrides that
1762+mechanism, and forces
1763+.I bzip2
1764+to decompress.
1765+.TP
1766+.B \-z --compress
1767+The complement to \-d: forces compression, regardless of the
1768+invokation name.
1769+.TP
1770+.B \-t --test
1771+Check integrity of the specified file(s), but don't decompress them.
1772+This really performs a trial decompression and throws away the result.
1773+.TP
1774+.B \-f --force
1775+Force overwrite of output files. Normally,
1776+.I bzip2
1777+will not overwrite
1778+existing output files. Also forces
1779+.I bzip2
1780+to break hard links
1781+to files, which it otherwise wouldn't do.
1782+.TP
1783+.B \-k --keep
1784+Keep (don't delete) input files during compression
1785+or decompression.
1786+.TP
1787+.B \-s --small
1788+Reduce memory usage, for compression, decompression and testing. Files
1789+are decompressed and tested using a modified algorithm which only
1790+requires 2.5 bytes per block byte. This means any file can be
1791+decompressed in 2300k of memory, albeit at about half the normal speed.
1792+
1793+During compression, \-s selects a block size of 200k, which limits
1794+memory use to around the same figure, at the expense of your compression
1795+ratio. In short, if your machine is low on memory (8 megabytes or
1796+less), use \-s for everything. See MEMORY MANAGEMENT below.
1797+.TP
1798+.B \-q --quiet
1799+Suppress non-essential warning messages. Messages pertaining to
1800+I/O errors and other critical events will not be suppressed.
1801+.TP
1802+.B \-v --verbose
1803+Verbose mode -- show the compression ratio for each file processed.
1804+Further \-v's increase the verbosity level, spewing out lots of
1805+information which is primarily of interest for diagnostic purposes.
1806+.TP
1807+.B \-L --license -V --version
1808+Display the software version, license terms and conditions.
1809+.TP
1810+.B \-1 to \-9
1811+Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
1812+effect when decompressing. See MEMORY MANAGEMENT below.
1813+.TP
1814+.B \--
1815+Treats all subsequent arguments as file names, even if they start
1816+with a dash. This is so you can handle files with names beginning
1817+with a dash, for example: bzip2 \-- \-myfilename.
1818+.TP
1819+.B \--repetitive-fast --repetitive-best
1820+These flags are redundant in versions 0.9.5 and above. They provided
1821+some coarse control over the behaviour of the sorting algorithm in
1822+earlier versions, which was sometimes useful. 0.9.5 and above have an
1823+improved algorithm which renders these flags irrelevant.
1824+
1825+.SH MEMORY MANAGEMENT
1826+.I bzip2
1827+compresses large files in blocks. The block size affects
1828+both the compression ratio achieved, and the amount of memory needed for
1829+compression and decompression. The flags \-1 through \-9
1830+specify the block size to be 100,000 bytes through 900,000 bytes (the
1831+default) respectively. At decompression time, the block size used for
1832+compression is read from the header of the compressed file, and
1833+.I bunzip2
1834+then allocates itself just enough memory to decompress
1835+the file. Since block sizes are stored in compressed files, it follows
1836+that the flags \-1 to \-9 are irrelevant to and so ignored
1837+during decompression.
1838+
1839+Compression and decompression requirements,
1840+in bytes, can be estimated as:
1841+
1842+ Compression: 400k + ( 8 x block size )
1843+
1844+ Decompression: 100k + ( 4 x block size ), or
1845+ 100k + ( 2.5 x block size )
1846+
1847+Larger block sizes give rapidly diminishing marginal returns. Most of
1848+the compression comes from the first two or three hundred k of block
1849+size, a fact worth bearing in mind when using
1850+.I bzip2
1851+on small machines.
1852+It is also important to appreciate that the decompression memory
1853+requirement is set at compression time by the choice of block size.
1854+
1855+For files compressed with the default 900k block size,
1856+.I bunzip2
1857+will require about 3700 kbytes to decompress. To support decompression
1858+of any file on a 4 megabyte machine,
1859+.I bunzip2
1860+has an option to
1861+decompress using approximately half this amount of memory, about 2300
1862+kbytes. Decompression speed is also halved, so you should use this
1863+option only where necessary. The relevant flag is -s.
1864+
1865+In general, try and use the largest block size memory constraints allow,
1866+since that maximises the compression achieved. Compression and
1867+decompression speed are virtually unaffected by block size.
1868+
1869+Another significant point applies to files which fit in a single block
1870+-- that means most files you'd encounter using a large block size. The
1871+amount of real memory touched is proportional to the size of the file,
1872+since the file is smaller than a block. For example, compressing a file
1873+20,000 bytes long with the flag -9 will cause the compressor to
1874+allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
1875+kbytes of it. Similarly, the decompressor will allocate 3700k but only
1876+touch 100k + 20000 * 4 = 180 kbytes.
1877+
1878+Here is a table which summarises the maximum memory usage for different
1879+block sizes. Also recorded is the total compressed size for 14 files of
1880+the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
1881+column gives some feel for how compression varies with block size.
1882+These figures tend to understate the advantage of larger block sizes for
1883+larger files, since the Corpus is dominated by smaller files.
1884+
1885+ Compress Decompress Decompress Corpus
1886+ Flag usage usage -s usage Size
1887+
1888+ -1 1200k 500k 350k 914704
1889+ -2 2000k 900k 600k 877703
1890+ -3 2800k 1300k 850k 860338
1891+ -4 3600k 1700k 1100k 846899
1892+ -5 4400k 2100k 1350k 845160
1893+ -6 5200k 2500k 1600k 838626
1894+ -7 6100k 2900k 1850k 834096
1895+ -8 6800k 3300k 2100k 828642
1896+ -9 7600k 3700k 2350k 828642
1897+
1898+.SH RECOVERING DATA FROM DAMAGED FILES
1899+.I bzip2
1900+compresses files in blocks, usually 900kbytes long. Each
1901+block is handled independently. If a media or transmission error causes
1902+a multi-block .bz2
1903+file to become damaged, it may be possible to
1904+recover data from the undamaged blocks in the file.
1905+
1906+The compressed representation of each block is delimited by a 48-bit
1907+pattern, which makes it possible to find the block boundaries with
1908+reasonable certainty. Each block also carries its own 32-bit CRC, so
1909+damaged blocks can be distinguished from undamaged ones.
1910+
1911+.I bzip2recover
1912+is a simple program whose purpose is to search for
1913+blocks in .bz2 files, and write each block out into its own .bz2
1914+file. You can then use
1915+.I bzip2
1916+\-t
1917+to test the
1918+integrity of the resulting files, and decompress those which are
1919+undamaged.
1920+
1921+.I bzip2recover
1922+takes a single argument, the name of the damaged file,
1923+and writes a number of files "rec0001file.bz2",
1924+"rec0002file.bz2", etc, containing the extracted blocks.
1925+The output filenames are designed so that the use of
1926+wildcards in subsequent processing -- for example,
1927+"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
1928+the correct order.
1929+
1930+.I bzip2recover
1931+should be of most use dealing with large .bz2
1932+files, as these will contain many blocks. It is clearly
1933+futile to use it on damaged single-block files, since a
1934+damaged block cannot be recovered. If you wish to minimise
1935+any potential data loss through media or transmission errors,
1936+you might consider compressing with a smaller
1937+block size.
1938+
1939+.SH PERFORMANCE NOTES
1940+The sorting phase of compression gathers together similar strings in the
1941+file. Because of this, files containing very long runs of repeated
1942+symbols, like "aabaabaabaab ..." (repeated several hundred times) may
1943+compress more slowly than normal. Versions 0.9.5 and above fare much
1944+better than previous versions in this respect. The ratio between
1945+worst-case and average-case compression time is in the region of 10:1.
1946+For previous versions, this figure was more like 100:1. You can use the
1947+\-vvvv option to monitor progress in great detail, if you want.
1948+
1949+Decompression speed is unaffected by these phenomena.
1950+
1951+.I bzip2
1952+usually allocates several megabytes of memory to operate
1953+in, and then charges all over it in a fairly random fashion. This means
1954+that performance, both for compressing and decompressing, is largely
1955+determined by the speed at which your machine can service cache misses.
1956+Because of this, small changes to the code to reduce the miss rate have
1957+been observed to give disproportionately large performance improvements.
1958+I imagine
1959+.I bzip2
1960+will perform best on machines with very large caches.
1961+
1962+.SH CAVEATS
1963+I/O error messages are not as helpful as they could be.
1964+.I bzip2
1965+tries hard to detect I/O errors and exit cleanly, but the details of
1966+what the problem is sometimes seem rather misleading.
1967+
1968+This manual page pertains to version 1.0 of
1969+.I bzip2.
1970+Compressed
1971+data created by this version is entirely forwards and backwards
1972+compatible with the previous public releases, versions 0.1pl2, 0.9.0
1973+and 0.9.5,
1974+but with the following exception: 0.9.0 and above can correctly
1975+decompress multiple concatenated compressed files. 0.1pl2 cannot do
1976+this; it will stop after decompressing just the first file in the
1977+stream.
1978+
1979+.I bzip2recover
1980+uses 32-bit integers to represent bit positions in
1981+compressed files, so it cannot handle compressed files more than 512
1982+megabytes long. This could easily be fixed.
1983+
1984+.SH AUTHOR
1985+Julian Seward, jseward@acm.org.
1986+
1987+http://sourceware.cygnus.com/bzip2
1988+http://www.muraroa.demon.co.uk
1989+
1990+The ideas embodied in
1991+.I bzip2
1992+are due to (at least) the following
1993+people: Michael Burrows and David Wheeler (for the block sorting
1994+transformation), David Wheeler (again, for the Huffman coder), Peter
1995+Fenwick (for the structured coding model in the original
1996+.I bzip,
1997+and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
1998+(for the arithmetic coder in the original
1999+.I bzip).
2000+I am much
2001+indebted for their help, support and advice. See the manual in the
2002+source distribution for pointers to sources of documentation. Christian
2003+von Roques encouraged me to look for faster sorting algorithms, so as to
2004+speed up compression. Bela Lubkin encouraged me to improve the
2005+worst-case compression performance. Many people sent patches, helped
2006+with portability problems, lent machines, gave advice and were generally
2007+helpful.
2008diff -Nru bzip2-1.0.1/doc/bzip2.texi bzip2-1.0.1.new/doc/bzip2.texi
2009--- bzip2-1.0.1/doc/bzip2.texi Thu Jan 1 01:00:00 1970
2010+++ bzip2-1.0.1.new/doc/bzip2.texi Sat Jun 24 20:13:06 2000
2011@@ -0,0 +1,2217 @@
2012+\input texinfo @c -*- Texinfo -*-
2013+@setfilename bzip2.info
2014+
2015+@ignore
2016+This file documents bzip2 version 1.0, and associated library
2017+libbzip2, written by Julian Seward (jseward@acm.org).
2018+
2019+Copyright (C) 1996-2000 Julian R Seward
2020+
2021+Permission is granted to make and distribute verbatim copies of
2022+this manual provided the copyright notice and this permission notice
2023+are preserved on all copies.
2024+
2025+Permission is granted to copy and distribute translations of this manual
2026+into another language, under the above conditions for verbatim copies.
2027+@end ignore
2028+
2029+@ifinfo
2030+@format
2031+@dircategory File utilities:
2032+* Bzip2: (bzip2). A program and library for data
2033+ compression
2034+@end direntry
2035+@end format
2036+@end ifinfo
2037+
2038+@iftex
2039+@c @finalout
2040+@settitle bzip2 and libbzip2
2041+@titlepage
2042+@title bzip2 and libbzip2
2043+@subtitle a program and library for data compression
2044+@subtitle copyright (C) 1996-2000 Julian Seward
2045+@subtitle version 1.0 of 21 March 2000
2046+@author Julian Seward
2047+
2048+@end titlepage
2049+
2050+@parindent 0mm
2051+@parskip 2mm
2052+
2053+@end iftex
2054+@node Top, Overview, (dir), (dir)
2055+
2056+@top bzip2
2057+
2058+This program, @code{bzip2},
2059+and associated library @code{libbzip2}, are
2060+Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
2061+
2062+Redistribution and use in source and binary forms, with or without
2063+modification, are permitted provided that the following conditions
2064+are met:
2065+@itemize @bullet
2066+@item
2067+ Redistributions of source code must retain the above copyright
2068+ notice, this list of conditions and the following disclaimer.
2069+@item
2070+ The origin of this software must not be misrepresented; you must
2071+ not claim that you wrote the original software. If you use this
2072+ software in a product, an acknowledgment in the product
2073+ documentation would be appreciated but is not required.
2074+@item
2075+ Altered source versions must be plainly marked as such, and must
2076+ not be misrepresented as being the original software.
2077+@item
2078+ The name of the author may not be used to endorse or promote
2079+ products derived from this software without specific prior written
2080+ permission.
2081+@end itemize
2082+THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
2083+OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
2084+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
2085+ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
2086+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
2087+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
2088+GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
2089+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
2090+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
2091+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
2092+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2093+
2094+Julian Seward, Cambridge, UK.
2095+
2096+@code{jseward@@acm.org}
2097+
2098+@code{http://sourceware.cygnus.com/bzip2}
2099+
2100+@code{http://www.cacheprof.org}
2101+
2102+@code{http://www.muraroa.demon.co.uk}
2103+
2104+@code{bzip2}/@code{libbzip2} version 1.0 of 21 March 2000.
2105+
2106+PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
2107+algorithms. However, I do not have the resources available to carry out
2108+a full patent search. Therefore I cannot give any guarantee of the
2109+above statement.
2110+
2111+
2112+
2113+
2114+
2115+
2116+
2117+@node Overview, Implementation, Top, Top
2118+@chapter Introduction
2119+
2120+@code{bzip2} compresses files using the Burrows-Wheeler
2121+block-sorting text compression algorithm, and Huffman coding.
2122+Compression is generally considerably better than that
2123+achieved by more conventional LZ77/LZ78-based compressors,
2124+and approaches the performance of the PPM family of statistical compressors.
2125+
2126+@code{bzip2} is built on top of @code{libbzip2}, a flexible library
2127+for handling compressed data in the @code{bzip2} format. This manual
2128+describes both how to use the program and
2129+how to work with the library interface. Most of the
2130+manual is devoted to this library, not the program,
2131+which is good news if your interest is only in the program.
2132+
2133+Chapter 2 describes how to use @code{bzip2}; this is the only part
2134+you need to read if you just want to know how to operate the program.
2135+Chapter 3 describes the programming interfaces in detail, and
2136+Chapter 4 records some miscellaneous notes which I thought
2137+ought to be recorded somewhere.
2138+
2139+
2140+@chapter How to use @code{bzip2}
2141+
2142+This chapter contains a copy of the @code{bzip2} man page,
2143+and nothing else.
2144+
2145+@quotation
2146+
2147+@unnumberedsubsubsec NAME
2148+@itemize
2149+@item @code{bzip2}, @code{bunzip2}
2150+- a block-sorting file compressor, v1.0
2151+@item @code{bzcat}
2152+- decompresses files to stdout
2153+@item @code{bzip2recover}
2154+- recovers data from damaged bzip2 files
2155+@end itemize
2156+
2157+@unnumberedsubsubsec SYNOPSIS
2158+@itemize
2159+@item @code{bzip2} [ -cdfkqstvzVL123456789 ] [ filenames ... ]
2160+@item @code{bunzip2} [ -fkvsVL ] [ filenames ... ]
2161+@item @code{bzcat} [ -s ] [ filenames ... ]
2162+@item @code{bzip2recover} filename
2163+@end itemize
2164+
2165+@unnumberedsubsubsec DESCRIPTION
2166+
2167+@code{bzip2} compresses files using the Burrows-Wheeler block sorting
2168+text compression algorithm, and Huffman coding. Compression is
2169+generally considerably better than that achieved by more conventional
2170+LZ77/LZ78-based compressors, and approaches the performance of the PPM
2171+family of statistical compressors.
2172+
2173+The command-line options are deliberately very similar to those of GNU
2174+@code{gzip}, but they are not identical.
2175+
2176+@code{bzip2} expects a list of file names to accompany the command-line
2177+flags. Each file is replaced by a compressed version of itself, with
2178+the name @code{original_name.bz2}. Each compressed file has the same
2179+modification date, permissions, and, when possible, ownership as the
2180+corresponding original, so that these properties can be correctly
2181+restored at decompression time. File name handling is naive in the
2182+sense that there is no mechanism for preserving original file names,
2183+permissions, ownerships or dates in filesystems which lack these
2184+concepts, or have serious file name length restrictions, such as MS-DOS.
2185+
2186+@code{bzip2} and @code{bunzip2} will by default not overwrite existing
2187+files. If you want this to happen, specify the @code{-f} flag.
2188+
2189+If no file names are specified, @code{bzip2} compresses from standard
2190+input to standard output. In this case, @code{bzip2} will decline to
2191+write compressed output to a terminal, as this would be entirely
2192+incomprehensible and therefore pointless.
2193+
2194+@code{bunzip2} (or @code{bzip2 -d}) decompresses all
2195+specified files. Files which were not created by @code{bzip2}
2196+will be detected and ignored, and a warning issued.
2197+@code{bzip2} attempts to guess the filename for the decompressed file
2198+from that of the compressed file as follows:
2199+@itemize
2200+@item @code{filename.bz2 } becomes @code{filename}
2201+@item @code{filename.bz } becomes @code{filename}
2202+@item @code{filename.tbz2} becomes @code{filename.tar}
2203+@item @code{filename.tbz } becomes @code{filename.tar}
2204+@item @code{anyothername } becomes @code{anyothername.out}
2205+@end itemize
2206+If the file does not end in one of the recognised endings,
2207+@code{.bz2}, @code{.bz},
2208+@code{.tbz2} or @code{.tbz}, @code{bzip2} complains that it cannot
2209+guess the name of the original file, and uses the original name
2210+with @code{.out} appended.
2211+
2212+As with compression, supplying no
2213+filenames causes decompression from standard input to standard output.
2214+
2215+@code{bunzip2} will correctly decompress a file which is the
2216+concatenation of two or more compressed files. The result is the
2217+concatenation of the corresponding uncompressed files. Integrity
2218+testing (@code{-t}) of concatenated compressed files is also supported.
2219+
2220+You can also compress or decompress files to the standard output by
2221+giving the @code{-c} flag. Multiple files may be compressed and
2222+decompressed like this. The resulting outputs are fed sequentially to
2223+stdout. Compression of multiple files in this manner generates a stream
2224+containing multiple compressed file representations. Such a stream
2225+can be decompressed correctly only by @code{bzip2} version 0.9.0 or
2226+later. Earlier versions of @code{bzip2} will stop after decompressing
2227+the first file in the stream.
2228+
2229+@code{bzcat} (or @code{bzip2 -dc}) decompresses all specified files to
2230+the standard output.
2231+
2232+@code{bzip2} will read arguments from the environment variables
2233+@code{BZIP2} and @code{BZIP}, in that order, and will process them
2234+before any arguments read from the command line. This gives a
2235+convenient way to supply default arguments.
2236+
2237+Compression is always performed, even if the compressed file is slightly
2238+larger than the original. Files of less than about one hundred bytes
2239+tend to get larger, since the compression mechanism has a constant
2240+overhead in the region of 50 bytes. Random data (including the output
2241+of most file compressors) is coded at about 8.05 bits per byte, giving
2242+an expansion of around 0.5%.
2243+
2244+As a self-check for your protection, @code{bzip2} uses 32-bit CRCs to
2245+make sure that the decompressed version of a file is identical to the
2246+original. This guards against corruption of the compressed data, and
2247+against undetected bugs in @code{bzip2} (hopefully very unlikely). The
2248+chances of data corruption going undetected is microscopic, about one
2249+chance in four billion for each file processed. Be aware, though, that
2250+the check occurs upon decompression, so it can only tell you that
2251+something is wrong. It can't help you recover the original uncompressed
2252+data. You can use @code{bzip2recover} to try to recover data from
2253+damaged files.
2254+
2255+Return values: 0 for a normal exit, 1 for environmental problems (file
2256+not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
2257+compressed file, 3 for an internal consistency error (eg, bug) which
2258+caused @code{bzip2} to panic.
2259+
2260+
2261+@unnumberedsubsubsec OPTIONS
2262+@table @code
2263+@item -c --stdout
2264+Compress or decompress to standard output.
2265+@item -d --decompress
2266+Force decompression. @code{bzip2}, @code{bunzip2} and @code{bzcat} are
2267+really the same program, and the decision about what actions to take is
2268+done on the basis of which name is used. This flag overrides that
2269+mechanism, and forces bzip2 to decompress.
2270+@item -z --compress
2271+The complement to @code{-d}: forces compression, regardless of the
2272+invokation name.
2273+@item -t --test
2274+Check integrity of the specified file(s), but don't decompress them.
2275+This really performs a trial decompression and throws away the result.
2276+@item -f --force
2277+Force overwrite of output files. Normally, @code{bzip2} will not overwrite
2278+existing output files. Also forces @code{bzip2} to break hard links
2279+to files, which it otherwise wouldn't do.
2280+@item -k --keep
2281+Keep (don't delete) input files during compression
2282+or decompression.
2283+@item -s --small
2284+Reduce memory usage, for compression, decompression and testing. Files
2285+are decompressed and tested using a modified algorithm which only
2286+requires 2.5 bytes per block byte. This means any file can be
2287+decompressed in 2300k of memory, albeit at about half the normal speed.
2288+
2289+During compression, @code{-s} selects a block size of 200k, which limits
2290+memory use to around the same figure, at the expense of your compression
2291+ratio. In short, if your machine is low on memory (8 megabytes or
2292+less), use -s for everything. See MEMORY MANAGEMENT below.
2293+@item -q --quiet
2294+Suppress non-essential warning messages. Messages pertaining to
2295+I/O errors and other critical events will not be suppressed.
2296+@item -v --verbose
2297+Verbose mode -- show the compression ratio for each file processed.
2298+Further @code{-v}'s increase the verbosity level, spewing out lots of
2299+information which is primarily of interest for diagnostic purposes.
2300+@item -L --license -V --version
2301+Display the software version, license terms and conditions.
2302+@item -1 to -9
2303+Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
2304+effect when decompressing. See MEMORY MANAGEMENT below.
2305+@item --
2306+Treats all subsequent arguments as file names, even if they start
2307+with a dash. This is so you can handle files with names beginning
2308+with a dash, for example: @code{bzip2 -- -myfilename}.
2309+@item --repetitive-fast
2310+@item --repetitive-best
2311+These flags are redundant in versions 0.9.5 and above. They provided
2312+some coarse control over the behaviour of the sorting algorithm in
2313+earlier versions, which was sometimes useful. 0.9.5 and above have an
2314+improved algorithm which renders these flags irrelevant.
2315+@end table
2316+
2317+
2318+@unnumberedsubsubsec MEMORY MANAGEMENT
2319+
2320+@code{bzip2} compresses large files in blocks. The block size affects
2321+both the compression ratio achieved, and the amount of memory needed for
2322+compression and decompression. The flags @code{-1} through @code{-9}
2323+specify the block size to be 100,000 bytes through 900,000 bytes (the
2324+default) respectively. At decompression time, the block size used for
2325+compression is read from the header of the compressed file, and
2326+@code{bunzip2} then allocates itself just enough memory to decompress
2327+the file. Since block sizes are stored in compressed files, it follows
2328+that the flags @code{-1} to @code{-9} are irrelevant to and so ignored
2329+during decompression.
2330+
2331+Compression and decompression requirements, in bytes, can be estimated
2332+as:
2333+@example
2334+ Compression: 400k + ( 8 x block size )
2335+
2336+ Decompression: 100k + ( 4 x block size ), or
2337+ 100k + ( 2.5 x block size )
2338+@end example
2339+Larger block sizes give rapidly diminishing marginal returns. Most of
2340+the compression comes from the first two or three hundred k of block
2341+size, a fact worth bearing in mind when using @code{bzip2} on small machines.
2342+It is also important to appreciate that the decompression memory
2343+requirement is set at compression time by the choice of block size.
2344+
2345+For files compressed with the default 900k block size, @code{bunzip2}
2346+will require about 3700 kbytes to decompress. To support decompression
2347+of any file on a 4 megabyte machine, @code{bunzip2} has an option to
2348+decompress using approximately half this amount of memory, about 2300
2349+kbytes. Decompression speed is also halved, so you should use this
2350+option only where necessary. The relevant flag is @code{-s}.
2351+
2352+In general, try and use the largest block size memory constraints allow,
2353+since that maximises the compression achieved. Compression and
2354+decompression speed are virtually unaffected by block size.
2355+
2356+Another significant point applies to files which fit in a single block
2357+-- that means most files you'd encounter using a large block size. The
2358+amount of real memory touched is proportional to the size of the file,
2359+since the file is smaller than a block. For example, compressing a file
2360+20,000 bytes long with the flag @code{-9} will cause the compressor to
2361+allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
2362+kbytes of it. Similarly, the decompressor will allocate 3700k but only
2363+touch 100k + 20000 * 4 = 180 kbytes.
2364+
2365+Here is a table which summarises the maximum memory usage for different
2366+block sizes. Also recorded is the total compressed size for 14 files of
2367+the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
2368+column gives some feel for how compression varies with block size.
2369+These figures tend to understate the advantage of larger block sizes for
2370+larger files, since the Corpus is dominated by smaller files.
2371+@example
2372+ Compress Decompress Decompress Corpus
2373+ Flag usage usage -s usage Size
2374+
2375+ -1 1200k 500k 350k 914704
2376+ -2 2000k 900k 600k 877703
2377+ -3 2800k 1300k 850k 860338
2378+ -4 3600k 1700k 1100k 846899
2379+ -5 4400k 2100k 1350k 845160
2380+ -6 5200k 2500k 1600k 838626
2381+ -7 6100k 2900k 1850k 834096
2382+ -8 6800k 3300k 2100k 828642
2383+ -9 7600k 3700k 2350k 828642
2384+@end example
2385+
2386+@unnumberedsubsubsec RECOVERING DATA FROM DAMAGED FILES
2387+
2388+@code{bzip2} compresses files in blocks, usually 900kbytes long. Each
2389+block is handled independently. If a media or transmission error causes
2390+a multi-block @code{.bz2} file to become damaged, it may be possible to
2391+recover data from the undamaged blocks in the file.
2392+
2393+The compressed representation of each block is delimited by a 48-bit
2394+pattern, which makes it possible to find the block boundaries with
2395+reasonable certainty. Each block also carries its own 32-bit CRC, so
2396+damaged blocks can be distinguished from undamaged ones.
2397+
2398+@code{bzip2recover} is a simple program whose purpose is to search for
2399+blocks in @code{.bz2} files, and write each block out into its own
2400+@code{.bz2} file. You can then use @code{bzip2 -t} to test the
2401+integrity of the resulting files, and decompress those which are
2402+undamaged.
2403+
2404+@code{bzip2recover}
2405+takes a single argument, the name of the damaged file,
2406+and writes a number of files @code{rec0001file.bz2},
2407+ @code{rec0002file.bz2}, etc, containing the extracted blocks.
2408+ The output filenames are designed so that the use of
2409+ wildcards in subsequent processing -- for example,
2410+@code{bzip2 -dc rec*file.bz2 > recovered_data} -- lists the files in
2411+ the correct order.
2412+
2413+@code{bzip2recover} should be of most use dealing with large @code{.bz2}
2414+ files, as these will contain many blocks. It is clearly
2415+ futile to use it on damaged single-block files, since a
2416+ damaged block cannot be recovered. If you wish to minimise
2417+any potential data loss through media or transmission errors,
2418+you might consider compressing with a smaller
2419+ block size.
2420+
2421+
2422+@unnumberedsubsubsec PERFORMANCE NOTES
2423+
2424+The sorting phase of compression gathers together similar strings in the
2425+file. Because of this, files containing very long runs of repeated
2426+symbols, like "aabaabaabaab ..." (repeated several hundred times) may
2427+compress more slowly than normal. Versions 0.9.5 and above fare much
2428+better than previous versions in this respect. The ratio between
2429+worst-case and average-case compression time is in the region of 10:1.
2430+For previous versions, this figure was more like 100:1. You can use the
2431+@code{-vvvv} option to monitor progress in great detail, if you want.
2432+
2433+Decompression speed is unaffected by these phenomena.
2434+
2435+@code{bzip2} usually allocates several megabytes of memory to operate
2436+in, and then charges all over it in a fairly random fashion. This means
2437+that performance, both for compressing and decompressing, is largely
2438+determined by the speed at which your machine can service cache misses.
2439+Because of this, small changes to the code to reduce the miss rate have
2440+been observed to give disproportionately large performance improvements.
2441+I imagine @code{bzip2} will perform best on machines with very large
2442+caches.
2443+
2444+
2445+@unnumberedsubsubsec CAVEATS
2446+
2447+I/O error messages are not as helpful as they could be. @code{bzip2}
2448+tries hard to detect I/O errors and exit cleanly, but the details of
2449+what the problem is sometimes seem rather misleading.
2450+
2451+This manual page pertains to version 1.0 of @code{bzip2}. Compressed
2452+data created by this version is entirely forwards and backwards
2453+compatible with the previous public releases, versions 0.1pl2, 0.9.0 and
2454+0.9.5, but with the following exception: 0.9.0 and above can correctly
2455+decompress multiple concatenated compressed files. 0.1pl2 cannot do
2456+this; it will stop after decompressing just the first file in the
2457+stream.
2458+
2459+@code{bzip2recover} uses 32-bit integers to represent bit positions in
2460+compressed files, so it cannot handle compressed files more than 512
2461+megabytes long. This could easily be fixed.
2462+
2463+
2464+@unnumberedsubsubsec AUTHOR
2465+Julian Seward, @code{jseward@@acm.org}.
2466+
2467+The ideas embodied in @code{bzip2} are due to (at least) the following
2468+people: Michael Burrows and David Wheeler (for the block sorting
2469+transformation), David Wheeler (again, for the Huffman coder), Peter
2470+Fenwick (for the structured coding model in the original @code{bzip},
2471+and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
2472+(for the arithmetic coder in the original @code{bzip}). I am much
2473+indebted for their help, support and advice. See the manual in the
2474+source distribution for pointers to sources of documentation. Christian
2475+von Roques encouraged me to look for faster sorting algorithms, so as to
2476+speed up compression. Bela Lubkin encouraged me to improve the
2477+worst-case compression performance. Many people sent patches, helped
2478+with portability problems, lent machines, gave advice and were generally
2479+helpful.
2480+
2481+@end quotation
2482+
2483+
2484+
2485+
2486+@chapter Programming with @code{libbzip2}
2487+
2488+This chapter describes the programming interface to @code{libbzip2}.
2489+
2490+For general background information, particularly about memory
2491+use and performance aspects, you'd be well advised to read Chapter 2
2492+as well.
2493+
2494+@section Top-level structure
2495+
2496+@code{libbzip2} is a flexible library for compressing and decompressing
2497+data in the @code{bzip2} data format. Although packaged as a single
2498+entity, it helps to regard the library as three separate parts: the low
2499+level interface, and the high level interface, and some utility
2500+functions.
2501+
2502+The structure of @code{libbzip2}'s interfaces is similar to
2503+that of Jean-loup Gailly's and Mark Adler's excellent @code{zlib}
2504+library.
2505+
2506+All externally visible symbols have names beginning @code{BZ2_}.
2507+This is new in version 1.0. The intention is to minimise pollution
2508+of the namespaces of library clients.
2509+
2510+@subsection Low-level summary
2511+
2512+This interface provides services for compressing and decompressing
2513+data in memory. There's no provision for dealing with files, streams
2514+or any other I/O mechanisms, just straight memory-to-memory work.
2515+In fact, this part of the library can be compiled without inclusion
2516+of @code{stdio.h}, which may be helpful for embedded applications.
2517+
2518+The low-level part of the library has no global variables and
2519+is therefore thread-safe.
2520+
2521+Six routines make up the low level interface:
2522+@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, and @* @code{BZ2_bzCompressEnd}
2523+for compression,
2524+and a corresponding trio @code{BZ2_bzDecompressInit}, @* @code{BZ2_bzDecompress}
2525+and @code{BZ2_bzDecompressEnd} for decompression.
2526+The @code{*Init} functions allocate
2527+memory for compression/decompression and do other
2528+initialisations, whilst the @code{*End} functions close down operations
2529+and release memory.
2530+
2531+The real work is done by @code{BZ2_bzCompress} and @code{BZ2_bzDecompress}.
2532+These compress and decompress data from a user-supplied input buffer
2533+to a user-supplied output buffer. These buffers can be any size;
2534+arbitrary quantities of data are handled by making repeated calls
2535+to these functions. This is a flexible mechanism allowing a
2536+consumer-pull style of activity, or producer-push, or a mixture of
2537+both.
2538+
2539+
2540+
2541+@subsection High-level summary
2542+
2543+This interface provides some handy wrappers around the low-level
2544+interface to facilitate reading and writing @code{bzip2} format
2545+files (@code{.bz2} files). The routines provide hooks to facilitate
2546+reading files in which the @code{bzip2} data stream is embedded
2547+within some larger-scale file structure, or where there are
2548+multiple @code{bzip2} data streams concatenated end-to-end.
2549+
2550+For reading files, @code{BZ2_bzReadOpen}, @code{BZ2_bzRead},
2551+@code{BZ2_bzReadClose} and @* @code{BZ2_bzReadGetUnused} are supplied. For
2552+writing files, @code{BZ2_bzWriteOpen}, @code{BZ2_bzWrite} and
2553+@code{BZ2_bzWriteFinish} are available.
2554+
2555+As with the low-level library, no global variables are used
2556+so the library is per se thread-safe. However, if I/O errors
2557+occur whilst reading or writing the underlying compressed files,
2558+you may have to consult @code{errno} to determine the cause of
2559+the error. In that case, you'd need a C library which correctly
2560+supports @code{errno} in a multithreaded environment.
2561+
2562+To make the library a little simpler and more portable,
2563+@code{BZ2_bzReadOpen} and @code{BZ2_bzWriteOpen} require you to pass them file
2564+handles (@code{FILE*}s) which have previously been opened for reading or
2565+writing respectively. That avoids portability problems associated with
2566+file operations and file attributes, whilst not being much of an
2567+imposition on the programmer.
2568+
2569+
2570+
2571+@subsection Utility functions summary
2572+For very simple needs, @code{BZ2_bzBuffToBuffCompress} and
2573+@code{BZ2_bzBuffToBuffDecompress} are provided. These compress
2574+data in memory from one buffer to another buffer in a single
2575+function call. You should assess whether these functions
2576+fulfill your memory-to-memory compression/decompression
2577+requirements before investing effort in understanding the more
2578+general but more complex low-level interface.
2579+
2580+Yoshioka Tsuneo (@code{QWF00133@@niftyserve.or.jp} /
2581+@code{tsuneo-y@@is.aist-nara.ac.jp}) has contributed some functions to
2582+give better @code{zlib} compatibility. These functions are
2583+@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
2584+@code{BZ2_bzclose},
2585+@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}. You may find these functions
2586+more convenient for simple file reading and writing, than those in the
2587+high-level interface. These functions are not (yet) officially part of
2588+the library, and are minimally documented here. If they break, you
2589+get to keep all the pieces. I hope to document them properly when time
2590+permits.
2591+
2592+Yoshioka also contributed modifications to allow the library to be
2593+built as a Windows DLL.
2594+
2595+
2596+@section Error handling
2597+
2598+The library is designed to recover cleanly in all situations, including
2599+the worst-case situation of decompressing random data. I'm not
2600+100% sure that it can always do this, so you might want to add
2601+a signal handler to catch segmentation violations during decompression
2602+if you are feeling especially paranoid. I would be interested in
2603+hearing more about the robustness of the library to corrupted
2604+compressed data.
2605+
2606+Version 1.0 is much more robust in this respect than
2607+0.9.0 or 0.9.5. Investigations with Checker (a tool for
2608+detecting problems with memory management, similar to Purify)
2609+indicate that, at least for the few files I tested, all single-bit
2610+errors in the decompressed data are caught properly, with no
2611+segmentation faults, no reads of uninitialised data and no
2612+out of range reads or writes. So it's certainly much improved,
2613+although I wouldn't claim it to be totally bombproof.
2614+
2615+The file @code{bzlib.h} contains all definitions needed to use
2616+the library. In particular, you should definitely not include
2617+@code{bzlib_private.h}.
2618+
2619+In @code{bzlib.h}, the various return values are defined. The following
2620+list is not intended as an exhaustive description of the circumstances
2621+in which a given value may be returned -- those descriptions are given
2622+later. Rather, it is intended to convey the rough meaning of each
2623+return value. The first five actions are normal and not intended to
2624+denote an error situation.
2625+@table @code
2626+@item BZ_OK
2627+The requested action was completed successfully.
2628+@item BZ_RUN_OK
2629+@itemx BZ_FLUSH_OK
2630+@itemx BZ_FINISH_OK
2631+In @code{BZ2_bzCompress}, the requested flush/finish/nothing-special action
2632+was completed successfully.
2633+@item BZ_STREAM_END
2634+Compression of data was completed, or the logical stream end was
2635+detected during decompression.
2636+@end table
2637+
2638+The following return values indicate an error of some kind.
2639+@table @code
2640+@item BZ_CONFIG_ERROR
2641+Indicates that the library has been improperly compiled on your
2642+platform -- a major configuration error. Specifically, it means
2643+that @code{sizeof(char)}, @code{sizeof(short)} and @code{sizeof(int)}
2644+are not 1, 2 and 4 respectively, as they should be. Note that the
2645+library should still work properly on 64-bit platforms which follow
2646+the LP64 programming model -- that is, where @code{sizeof(long)}
2647+and @code{sizeof(void*)} are 8. Under LP64, @code{sizeof(int)} is
2648+still 4, so @code{libbzip2}, which doesn't use the @code{long} type,
2649+is OK.
2650+@item BZ_SEQUENCE_ERROR
2651+When using the library, it is important to call the functions in the
2652+correct sequence and with data structures (buffers etc) in the correct
2653+states. @code{libbzip2} checks as much as it can to ensure this is
2654+happening, and returns @code{BZ_SEQUENCE_ERROR} if not. Code which
2655+complies precisely with the function semantics, as detailed below,
2656+should never receive this value; such an event denotes buggy code
2657+which you should investigate.
2658+@item BZ_PARAM_ERROR
2659+Returned when a parameter to a function call is out of range
2660+or otherwise manifestly incorrect. As with @code{BZ_SEQUENCE_ERROR},
2661+this denotes a bug in the client code. The distinction between
2662+@code{BZ_PARAM_ERROR} and @code{BZ_SEQUENCE_ERROR} is a bit hazy, but still worth
2663+making.
2664+@item BZ_MEM_ERROR
2665+Returned when a request to allocate memory failed. Note that the
2666+quantity of memory needed to decompress a stream cannot be determined
2667+until the stream's header has been read. So @code{BZ2_bzDecompress} and
2668+@code{BZ2_bzRead} may return @code{BZ_MEM_ERROR} even though some of
2669+the compressed data has been read. The same is not true for
2670+compression; once @code{BZ2_bzCompressInit} or @code{BZ2_bzWriteOpen} have
2671+successfully completed, @code{BZ_MEM_ERROR} cannot occur.
2672+@item BZ_DATA_ERROR
2673+Returned when a data integrity error is detected during decompression.
2674+Most importantly, this means when stored and computed CRCs for the
2675+data do not match. This value is also returned upon detection of any
2676+other anomaly in the compressed data.
2677+@item BZ_DATA_ERROR_MAGIC
2678+As a special case of @code{BZ_DATA_ERROR}, it is sometimes useful to
2679+know when the compressed stream does not start with the correct
2680+magic bytes (@code{'B' 'Z' 'h'}).
2681+@item BZ_IO_ERROR
2682+Returned by @code{BZ2_bzRead} and @code{BZ2_bzWrite} when there is an error
2683+reading or writing in the compressed file, and by @code{BZ2_bzReadOpen}
2684+and @code{BZ2_bzWriteOpen} for attempts to use a file for which the
2685+error indicator (viz, @code{ferror(f)}) is set.
2686+On receipt of @code{BZ_IO_ERROR}, the caller should consult
2687+@code{errno} and/or @code{perror} to acquire operating-system
2688+specific information about the problem.
2689+@item BZ_UNEXPECTED_EOF
2690+Returned by @code{BZ2_bzRead} when the compressed file finishes
2691+before the logical end of stream is detected.
2692+@item BZ_OUTBUFF_FULL
2693+Returned by @code{BZ2_bzBuffToBuffCompress} and
2694+@code{BZ2_bzBuffToBuffDecompress} to indicate that the output data
2695+will not fit into the output buffer provided.
2696+@end table
2697+
2698+
2699+
2700+@section Low-level interface
2701+
2702+@subsection @code{BZ2_bzCompressInit}
2703+@example
2704+typedef
2705+ struct @{
2706+ char *next_in;
2707+ unsigned int avail_in;
2708+ unsigned int total_in_lo32;
2709+ unsigned int total_in_hi32;
2710+
2711+ char *next_out;
2712+ unsigned int avail_out;
2713+ unsigned int total_out_lo32;
2714+ unsigned int total_out_hi32;
2715+
2716+ void *state;
2717+
2718+ void *(*bzalloc)(void *,int,int);
2719+ void (*bzfree)(void *,void *);
2720+ void *opaque;
2721+ @}
2722+ bz_stream;
2723+
2724+int BZ2_bzCompressInit ( bz_stream *strm,
2725+ int blockSize100k,
2726+ int verbosity,
2727+ int workFactor );
2728+
2729+@end example
2730+
2731+Prepares for compression. The @code{bz_stream} structure
2732+holds all data pertaining to the compression activity.
2733+A @code{bz_stream} structure should be allocated and initialised
2734+prior to the call.
2735+The fields of @code{bz_stream}
2736+comprise the entirety of the user-visible data. @code{state}
2737+is a pointer to the private data structures required for compression.
2738+
2739+Custom memory allocators are supported, via fields @code{bzalloc},
2740+@code{bzfree},
2741+and @code{opaque}. The value
2742+@code{opaque} is passed to as the first argument to
2743+all calls to @code{bzalloc} and @code{bzfree}, but is
2744+otherwise ignored by the library.
2745+The call @code{bzalloc ( opaque, n, m )} is expected to return a
2746+pointer @code{p} to
2747+@code{n * m} bytes of memory, and @code{bzfree ( opaque, p )}
2748+should free
2749+that memory.
2750+
2751+If you don't want to use a custom memory allocator, set @code{bzalloc},
2752+@code{bzfree} and
2753+@code{opaque} to @code{NULL},
2754+and the library will then use the standard @code{malloc}/@code{free}
2755+routines.
2756+
2757+Before calling @code{BZ2_bzCompressInit}, fields @code{bzalloc},
2758+@code{bzfree} and @code{opaque} should
2759+be filled appropriately, as just described. Upon return, the internal
2760+state will have been allocated and initialised, and @code{total_in_lo32},
2761+@code{total_in_hi32}, @code{total_out_lo32} and
2762+@code{total_out_hi32} will have been set to zero.
2763+These four fields are used by the library
2764+to inform the caller of the total amount of data passed into and out of
2765+the library, respectively. You should not try to change them.
2766+As of version 1.0, 64-bit counts are maintained, even on 32-bit
2767+platforms, using the @code{_hi32} fields to store the upper 32 bits
2768+of the count. So, for example, the total amount of data in
2769+is @code{(total_in_hi32 << 32) + total_in_lo32}.
2770+
2771+Parameter @code{blockSize100k} specifies the block size to be used for
2772+compression. It should be a value between 1 and 9 inclusive, and the
2773+actual block size used is 100000 x this figure. 9 gives the best
2774+compression but takes most memory.
2775+
2776+Parameter @code{verbosity} should be set to a number between 0 and 4
2777+inclusive. 0 is silent, and greater numbers give increasingly verbose
2778+monitoring/debugging output. If the library has been compiled with
2779+@code{-DBZ_NO_STDIO}, no such output will appear for any verbosity
2780+setting.
2781+
2782+Parameter @code{workFactor} controls how the compression phase behaves
2783+when presented with worst case, highly repetitive, input data. If
2784+compression runs into difficulties caused by repetitive data, the
2785+library switches from the standard sorting algorithm to a fallback
2786+algorithm. The fallback is slower than the standard algorithm by
2787+perhaps a factor of three, but always behaves reasonably, no matter how
2788+bad the input.
2789+
2790+Lower values of @code{workFactor} reduce the amount of effort the
2791+standard algorithm will expend before resorting to the fallback. You
2792+should set this parameter carefully; too low, and many inputs will be
2793+handled by the fallback algorithm and so compress rather slowly, too
2794+high, and your average-to-worst case compression times can become very
2795+large. The default value of 30 gives reasonable behaviour over a wide
2796+range of circumstances.
2797+
2798+Allowable values range from 0 to 250 inclusive. 0 is a special case,
2799+equivalent to using the default value of 30.
2800+
2801+Note that the compressed output generated is the same regardless of
2802+whether or not the fallback algorithm is used.
2803+
2804+Be aware also that this parameter may disappear entirely in future
2805+versions of the library. In principle it should be possible to devise a
2806+good way to automatically choose which algorithm to use. Such a
2807+mechanism would render the parameter obsolete.
2808+
2809+Possible return values:
2810+@display
2811+ @code{BZ_CONFIG_ERROR}
2812+ if the library has been mis-compiled
2813+ @code{BZ_PARAM_ERROR}
2814+ if @code{strm} is @code{NULL}
2815+ or @code{blockSize} < 1 or @code{blockSize} > 9
2816+ or @code{verbosity} < 0 or @code{verbosity} > 4
2817+ or @code{workFactor} < 0 or @code{workFactor} > 250
2818+ @code{BZ_MEM_ERROR}
2819+ if not enough memory is available
2820+ @code{BZ_OK}
2821+ otherwise
2822+@end display
2823+Allowable next actions:
2824+@display
2825+ @code{BZ2_bzCompress}
2826+ if @code{BZ_OK} is returned
2827+ no specific action needed in case of error
2828+@end display
2829+
2830+@subsection @code{BZ2_bzCompress}
2831+@example
2832+ int BZ2_bzCompress ( bz_stream *strm, int action );
2833+@end example
2834+Provides more input and/or output buffer space for the library. The
2835+caller maintains input and output buffers, and calls @code{BZ2_bzCompress} to
2836+transfer data between them.
2837+
2838+Before each call to @code{BZ2_bzCompress}, @code{next_in} should point at
2839+the data to be compressed, and @code{avail_in} should indicate how many
2840+bytes the library may read. @code{BZ2_bzCompress} updates @code{next_in},
2841+@code{avail_in} and @code{total_in} to reflect the number of bytes it
2842+has read.
2843+
2844+Similarly, @code{next_out} should point to a buffer in which the
2845+compressed data is to be placed, with @code{avail_out} indicating how
2846+much output space is available. @code{BZ2_bzCompress} updates
2847+@code{next_out}, @code{avail_out} and @code{total_out} to reflect the
2848+number of bytes output.
2849+
2850+You may provide and remove as little or as much data as you like on each
2851+call of @code{BZ2_bzCompress}. In the limit, it is acceptable to supply and
2852+remove data one byte at a time, although this would be terribly
2853+inefficient. You should always ensure that at least one byte of output
2854+space is available at each call.
2855+
2856+A second purpose of @code{BZ2_bzCompress} is to request a change of mode of the
2857+compressed stream.
2858+
2859+Conceptually, a compressed stream can be in one of four states: IDLE,
2860+RUNNING, FLUSHING and FINISHING. Before initialisation
2861+(@code{BZ2_bzCompressInit}) and after termination (@code{BZ2_bzCompressEnd}), a
2862+stream is regarded as IDLE.
2863+
2864+Upon initialisation (@code{BZ2_bzCompressInit}), the stream is placed in the
2865+RUNNING state. Subsequent calls to @code{BZ2_bzCompress} should pass
2866+@code{BZ_RUN} as the requested action; other actions are illegal and
2867+will result in @code{BZ_SEQUENCE_ERROR}.
2868+
2869+At some point, the calling program will have provided all the input data
2870+it wants to. It will then want to finish up -- in effect, asking the
2871+library to process any data it might have buffered internally. In this
2872+state, @code{BZ2_bzCompress} will no longer attempt to read data from
2873+@code{next_in}, but it will want to write data to @code{next_out}.
2874+Because the output buffer supplied by the user can be arbitrarily small,
2875+the finishing-up operation cannot necessarily be done with a single call
2876+of @code{BZ2_bzCompress}.
2877+
2878+Instead, the calling program passes @code{BZ_FINISH} as an action to
2879+@code{BZ2_bzCompress}. This changes the stream's state to FINISHING. Any
2880+remaining input (ie, @code{next_in[0 .. avail_in-1]}) is compressed and
2881+transferred to the output buffer. To do this, @code{BZ2_bzCompress} must be
2882+called repeatedly until all the output has been consumed. At that
2883+point, @code{BZ2_bzCompress} returns @code{BZ_STREAM_END}, and the stream's
2884+state is set back to IDLE. @code{BZ2_bzCompressEnd} should then be
2885+called.
2886+
2887+Just to make sure the calling program does not cheat, the library makes
2888+a note of @code{avail_in} at the time of the first call to
2889+@code{BZ2_bzCompress} which has @code{BZ_FINISH} as an action (ie, at the
2890+time the program has announced its intention to not supply any more
2891+input). By comparing this value with that of @code{avail_in} over
2892+subsequent calls to @code{BZ2_bzCompress}, the library can detect any
2893+attempts to slip in more data to compress. Any calls for which this is
2894+detected will return @code{BZ_SEQUENCE_ERROR}. This indicates a
2895+programming mistake which should be corrected.
2896+
2897+Instead of asking to finish, the calling program may ask
2898+@code{BZ2_bzCompress} to take all the remaining input, compress it and
2899+terminate the current (Burrows-Wheeler) compression block. This could
2900+be useful for error control purposes. The mechanism is analogous to
2901+that for finishing: call @code{BZ2_bzCompress} with an action of
2902+@code{BZ_FLUSH}, remove output data, and persist with the
2903+@code{BZ_FLUSH} action until the value @code{BZ_RUN} is returned. As
2904+with finishing, @code{BZ2_bzCompress} detects any attempt to provide more
2905+input data once the flush has begun.
2906+
2907+Once the flush is complete, the stream returns to the normal RUNNING
2908+state.
2909+
2910+This all sounds pretty complex, but isn't really. Here's a table
2911+which shows which actions are allowable in each state, what action
2912+will be taken, what the next state is, and what the non-error return
2913+values are. Note that you can't explicitly ask what state the
2914+stream is in, but nor do you need to -- it can be inferred from the
2915+values returned by @code{BZ2_bzCompress}.
2916+@display
2917+IDLE/@code{any}
2918+ Illegal. IDLE state only exists after @code{BZ2_bzCompressEnd} or
2919+ before @code{BZ2_bzCompressInit}.
2920+ Return value = @code{BZ_SEQUENCE_ERROR}
2921+
2922+RUNNING/@code{BZ_RUN}
2923+ Compress from @code{next_in} to @code{next_out} as much as possible.
2924+ Next state = RUNNING
2925+ Return value = @code{BZ_RUN_OK}
2926+
2927+RUNNING/@code{BZ_FLUSH}
2928+ Remember current value of @code{next_in}. Compress from @code{next_in}
2929+ to @code{next_out} as much as possible, but do not accept any more input.
2930+ Next state = FLUSHING
2931+ Return value = @code{BZ_FLUSH_OK}
2932+
2933+RUNNING/@code{BZ_FINISH}
2934+ Remember current value of @code{next_in}. Compress from @code{next_in}
2935+ to @code{next_out} as much as possible, but do not accept any more input.
2936+ Next state = FINISHING
2937+ Return value = @code{BZ_FINISH_OK}
2938+
2939+FLUSHING/@code{BZ_FLUSH}
2940+ Compress from @code{next_in} to @code{next_out} as much as possible,
2941+ but do not accept any more input.
2942+ If all the existing input has been used up and all compressed
2943+ output has been removed
2944+ Next state = RUNNING; Return value = @code{BZ_RUN_OK}
2945+ else
2946+ Next state = FLUSHING; Return value = @code{BZ_FLUSH_OK}
2947+
2948+FLUSHING/other
2949+ Illegal.
2950+ Return value = @code{BZ_SEQUENCE_ERROR}
2951+
2952+FINISHING/@code{BZ_FINISH}
2953+ Compress from @code{next_in} to @code{next_out} as much as possible,
2954+ but to not accept any more input.
2955+ If all the existing input has been used up and all compressed
2956+ output has been removed
2957+ Next state = IDLE; Return value = @code{BZ_STREAM_END}
2958+ else
2959+ Next state = FINISHING; Return value = @code{BZ_FINISHING}
2960+
2961+FINISHING/other
2962+ Illegal.
2963+ Return value = @code{BZ_SEQUENCE_ERROR}
2964+@end display
2965+
2966+That still looks complicated? Well, fair enough. The usual sequence
2967+of calls for compressing a load of data is:
2968+@itemize @bullet
2969+@item Get started with @code{BZ2_bzCompressInit}.
2970+@item Shovel data in and shlurp out its compressed form using zero or more
2971+calls of @code{BZ2_bzCompress} with action = @code{BZ_RUN}.
2972+@item Finish up.
2973+Repeatedly call @code{BZ2_bzCompress} with action = @code{BZ_FINISH},
2974+copying out the compressed output, until @code{BZ_STREAM_END} is returned.
2975+@item Close up and go home. Call @code{BZ2_bzCompressEnd}.
2976+@end itemize
2977+If the data you want to compress fits into your input buffer all
2978+at once, you can skip the calls of @code{BZ2_bzCompress ( ..., BZ_RUN )} and
2979+just do the @code{BZ2_bzCompress ( ..., BZ_FINISH )} calls.
2980+
2981+All required memory is allocated by @code{BZ2_bzCompressInit}. The
2982+compression library can accept any data at all (obviously). So you
2983+shouldn't get any error return values from the @code{BZ2_bzCompress} calls.
2984+If you do, they will be @code{BZ_SEQUENCE_ERROR}, and indicate a bug in
2985+your programming.
2986+
2987+Trivial other possible return values:
2988+@display
2989+ @code{BZ_PARAM_ERROR}
2990+ if @code{strm} is @code{NULL}, or @code{strm->s} is @code{NULL}
2991+@end display
2992+
2993+@subsection @code{BZ2_bzCompressEnd}
2994+@example
2995+int BZ2_bzCompressEnd ( bz_stream *strm );
2996+@end example
2997+Releases all memory associated with a compression stream.
2998+
2999+Possible return values:
3000+@display
3001+ @code{BZ_PARAM_ERROR} if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
3002+ @code{BZ_OK} otherwise
3003+@end display
3004+
3005+
3006+@subsection @code{BZ2_bzDecompressInit}
3007+@example
3008+int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
3009+@end example
3010+Prepares for decompression. As with @code{BZ2_bzCompressInit}, a
3011+@code{bz_stream} record should be allocated and initialised before the
3012+call. Fields @code{bzalloc}, @code{bzfree} and @code{opaque} should be
3013+set if a custom memory allocator is required, or made @code{NULL} for
3014+the normal @code{malloc}/@code{free} routines. Upon return, the internal
3015+state will have been initialised, and @code{total_in} and
3016+@code{total_out} will be zero.
3017+
3018+For the meaning of parameter @code{verbosity}, see @code{BZ2_bzCompressInit}.
3019+
3020+If @code{small} is nonzero, the library will use an alternative
3021+decompression algorithm which uses less memory but at the cost of
3022+decompressing more slowly (roughly speaking, half the speed, but the
3023+maximum memory requirement drops to around 2300k). See Chapter 2 for
3024+more information on memory management.
3025+
3026+Note that the amount of memory needed to decompress
3027+a stream cannot be determined until the stream's header has been read,
3028+so even if @code{BZ2_bzDecompressInit} succeeds, a subsequent
3029+@code{BZ2_bzDecompress} could fail with @code{BZ_MEM_ERROR}.
3030+
3031+Possible return values:
3032+@display
3033+ @code{BZ_CONFIG_ERROR}
3034+ if the library has been mis-compiled
3035+ @code{BZ_PARAM_ERROR}
3036+ if @code{(small != 0 && small != 1)}
3037+ or @code{(verbosity < 0 || verbosity > 4)}
3038+ @code{BZ_MEM_ERROR}
3039+ if insufficient memory is available
3040+@end display
3041+
3042+Allowable next actions:
3043+@display
3044+ @code{BZ2_bzDecompress}
3045+ if @code{BZ_OK} was returned
3046+ no specific action required in case of error
3047+@end display
3048+
3049+
3050+
3051+@subsection @code{BZ2_bzDecompress}
3052+@example
3053+int BZ2_bzDecompress ( bz_stream *strm );
3054+@end example
3055+Provides more input and/out output buffer space for the library. The
3056+caller maintains input and output buffers, and uses @code{BZ2_bzDecompress}
3057+to transfer data between them.
3058+
3059+Before each call to @code{BZ2_bzDecompress}, @code{next_in}
3060+should point at the compressed data,
3061+and @code{avail_in} should indicate how many bytes the library
3062+may read. @code{BZ2_bzDecompress} updates @code{next_in}, @code{avail_in}
3063+and @code{total_in}
3064+to reflect the number of bytes it has read.
3065+
3066+Similarly, @code{next_out} should point to a buffer in which the uncompressed
3067+output is to be placed, with @code{avail_out} indicating how much output space
3068+is available. @code{BZ2_bzCompress} updates @code{next_out},
3069+@code{avail_out} and @code{total_out} to reflect
3070+the number of bytes output.
3071+
3072+You may provide and remove as little or as much data as you like on
3073+each call of @code{BZ2_bzDecompress}.
3074+In the limit, it is acceptable to
3075+supply and remove data one byte at a time, although this would be
3076+terribly inefficient. You should always ensure that at least one
3077+byte of output space is available at each call.
3078+
3079+Use of @code{BZ2_bzDecompress} is simpler than @code{BZ2_bzCompress}.
3080+
3081+You should provide input and remove output as described above, and
3082+repeatedly call @code{BZ2_bzDecompress} until @code{BZ_STREAM_END} is
3083+returned. Appearance of @code{BZ_STREAM_END} denotes that
3084+@code{BZ2_bzDecompress} has detected the logical end of the compressed
3085+stream. @code{BZ2_bzDecompress} will not produce @code{BZ_STREAM_END} until
3086+all output data has been placed into the output buffer, so once
3087+@code{BZ_STREAM_END} appears, you are guaranteed to have available all
3088+the decompressed output, and @code{BZ2_bzDecompressEnd} can safely be
3089+called.
3090+
3091+If case of an error return value, you should call @code{BZ2_bzDecompressEnd}
3092+to clean up and release memory.
3093+
3094+Possible return values:
3095+@display
3096+ @code{BZ_PARAM_ERROR}
3097+ if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
3098+ or @code{strm->avail_out < 1}
3099+ @code{BZ_DATA_ERROR}
3100+ if a data integrity error is detected in the compressed stream
3101+ @code{BZ_DATA_ERROR_MAGIC}
3102+ if the compressed stream doesn't begin with the right magic bytes
3103+ @code{BZ_MEM_ERROR}
3104+ if there wasn't enough memory available
3105+ @code{BZ_STREAM_END}
3106+ if the logical end of the data stream was detected and all
3107+ output in has been consumed, eg @code{s->avail_out > 0}
3108+ @code{BZ_OK}
3109+ otherwise
3110+@end display
3111+Allowable next actions:
3112+@display
3113+ @code{BZ2_bzDecompress}
3114+ if @code{BZ_OK} was returned
3115+ @code{BZ2_bzDecompressEnd}
3116+ otherwise
3117+@end display
3118+
3119+
3120+@subsection @code{BZ2_bzDecompressEnd}
3121+@example
3122+int BZ2_bzDecompressEnd ( bz_stream *strm );
3123+@end example
3124+Releases all memory associated with a decompression stream.
3125+
3126+Possible return values:
3127+@display
3128+ @code{BZ_PARAM_ERROR}
3129+ if @code{strm} is @code{NULL} or @code{strm->s} is @code{NULL}
3130+ @code{BZ_OK}
3131+ otherwise
3132+@end display
3133+
3134+Allowable next actions:
3135+@display
3136+ None.
3137+@end display
3138+
3139+
3140+@section High-level interface
3141+
3142+This interface provides functions for reading and writing
3143+@code{bzip2} format files. First, some general points.
3144+
3145+@itemize @bullet
3146+@item All of the functions take an @code{int*} first argument,
3147+ @code{bzerror}.
3148+ After each call, @code{bzerror} should be consulted first to determine
3149+ the outcome of the call. If @code{bzerror} is @code{BZ_OK},
3150+ the call completed
3151+ successfully, and only then should the return value of the function
3152+ (if any) be consulted. If @code{bzerror} is @code{BZ_IO_ERROR},
3153+ there was an error
3154+ reading/writing the underlying compressed file, and you should
3155+ then consult @code{errno}/@code{perror} to determine the
3156+ cause of the difficulty.
3157+ @code{bzerror} may also be set to various other values; precise details are
3158+ given on a per-function basis below.
3159+@item If @code{bzerror} indicates an error
3160+ (ie, anything except @code{BZ_OK} and @code{BZ_STREAM_END}),
3161+ you should immediately call @code{BZ2_bzReadClose} (or @code{BZ2_bzWriteClose},
3162+ depending on whether you are attempting to read or to write)
3163+ to free up all resources associated
3164+ with the stream. Once an error has been indicated, behaviour of all calls
3165+ except @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) is undefined.
3166+ The implication is that (1) @code{bzerror} should
3167+ be checked after each call, and (2) if @code{bzerror} indicates an error,
3168+ @code{BZ2_bzReadClose} (@code{BZ2_bzWriteClose}) should then be called to clean up.
3169+@item The @code{FILE*} arguments passed to
3170+ @code{BZ2_bzReadOpen}/@code{BZ2_bzWriteOpen}
3171+ should be set to binary mode.
3172+ Most Unix systems will do this by default, but other platforms,
3173+ including Windows and Mac, will not. If you omit this, you may
3174+ encounter problems when moving code to new platforms.
3175+@item Memory allocation requests are handled by
3176+ @code{malloc}/@code{free}.
3177+ At present
3178+ there is no facility for user-defined memory allocators in the file I/O
3179+ functions (could easily be added, though).
3180+@end itemize
3181+
3182+
3183+
3184+@subsection @code{BZ2_bzReadOpen}
3185+@example
3186+ typedef void BZFILE;
3187+
3188+ BZFILE *BZ2_bzReadOpen ( int *bzerror, FILE *f,
3189+ int small, int verbosity,
3190+ void *unused, int nUnused );
3191+@end example
3192+Prepare to read compressed data from file handle @code{f}. @code{f}
3193+should refer to a file which has been opened for reading, and for which
3194+the error indicator (@code{ferror(f)})is not set. If @code{small} is 1,
3195+the library will try to decompress using less memory, at the expense of
3196+speed.
3197+
3198+For reasons explained below, @code{BZ2_bzRead} will decompress the
3199+@code{nUnused} bytes starting at @code{unused}, before starting to read
3200+from the file @code{f}. At most @code{BZ_MAX_UNUSED} bytes may be
3201+supplied like this. If this facility is not required, you should pass
3202+@code{NULL} and @code{0} for @code{unused} and n@code{Unused}
3203+respectively.
3204+
3205+For the meaning of parameters @code{small} and @code{verbosity},
3206+see @code{BZ2_bzDecompressInit}.
3207+
3208+The amount of memory needed to decompress a file cannot be determined
3209+until the file's header has been read. So it is possible that
3210+@code{BZ2_bzReadOpen} returns @code{BZ_OK} but a subsequent call of
3211+@code{BZ2_bzRead} will return @code{BZ_MEM_ERROR}.
3212+
3213+Possible assignments to @code{bzerror}:
3214+@display
3215+ @code{BZ_CONFIG_ERROR}
3216+ if the library has been mis-compiled
3217+ @code{BZ_PARAM_ERROR}
3218+ if @code{f} is @code{NULL}
3219+ or @code{small} is neither @code{0} nor @code{1}
3220+ or @code{(unused == NULL && nUnused != 0)}
3221+ or @code{(unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED))}
3222+ @code{BZ_IO_ERROR}
3223+ if @code{ferror(f)} is nonzero
3224+ @code{BZ_MEM_ERROR}
3225+ if insufficient memory is available
3226+ @code{BZ_OK}
3227+ otherwise.
3228+@end display
3229+
3230+Possible return values:
3231+@display
3232+ Pointer to an abstract @code{BZFILE}
3233+ if @code{bzerror} is @code{BZ_OK}
3234+ @code{NULL}
3235+ otherwise
3236+@end display
3237+
3238+Allowable next actions:
3239+@display
3240+ @code{BZ2_bzRead}
3241+ if @code{bzerror} is @code{BZ_OK}
3242+ @code{BZ2_bzClose}
3243+ otherwise
3244+@end display
3245+
3246+
3247+@subsection @code{BZ2_bzRead}
3248+@example
3249+ int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
3250+@end example
3251+Reads up to @code{len} (uncompressed) bytes from the compressed file
3252+@code{b} into
3253+the buffer @code{buf}. If the read was successful,
3254+@code{bzerror} is set to @code{BZ_OK}
3255+and the number of bytes read is returned. If the logical end-of-stream
3256+was detected, @code{bzerror} will be set to @code{BZ_STREAM_END},
3257+and the number
3258+of bytes read is returned. All other @code{bzerror} values denote an error.
3259+
3260+@code{BZ2_bzRead} will supply @code{len} bytes,
3261+unless the logical stream end is detected
3262+or an error occurs. Because of this, it is possible to detect the
3263+stream end by observing when the number of bytes returned is
3264+less than the number
3265+requested. Nevertheless, this is regarded as inadvisable; you should
3266+instead check @code{bzerror} after every call and watch out for
3267+@code{BZ_STREAM_END}.
3268+
3269+Internally, @code{BZ2_bzRead} copies data from the compressed file in chunks
3270+of size @code{BZ_MAX_UNUSED} bytes
3271+before decompressing it. If the file contains more bytes than strictly
3272+needed to reach the logical end-of-stream, @code{BZ2_bzRead} will almost certainly
3273+read some of the trailing data before signalling @code{BZ_SEQUENCE_END}.
3274+To collect the read but unused data once @code{BZ_SEQUENCE_END} has
3275+appeared, call @code{BZ2_bzReadGetUnused} immediately before @code{BZ2_bzReadClose}.
3276+
3277+Possible assignments to @code{bzerror}:
3278+@display
3279+ @code{BZ_PARAM_ERROR}
3280+ if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
3281+ @code{BZ_SEQUENCE_ERROR}
3282+ if @code{b} was opened with @code{BZ2_bzWriteOpen}
3283+ @code{BZ_IO_ERROR}
3284+ if there is an error reading from the compressed file
3285+ @code{BZ_UNEXPECTED_EOF}
3286+ if the compressed file ended before the logical end-of-stream was detected
3287+ @code{BZ_DATA_ERROR}
3288+ if a data integrity error was detected in the compressed stream
3289+ @code{BZ_DATA_ERROR_MAGIC}
3290+ if the stream does not begin with the requisite header bytes (ie, is not
3291+ a @code{bzip2} data file). This is really a special case of @code{BZ_DATA_ERROR}.
3292+ @code{BZ_MEM_ERROR}
3293+ if insufficient memory was available
3294+ @code{BZ_STREAM_END}
3295+ if the logical end of stream was detected.
3296+ @code{BZ_OK}
3297+ otherwise.
3298+@end display
3299+
3300+Possible return values:
3301+@display
3302+ number of bytes read
3303+ if @code{bzerror} is @code{BZ_OK} or @code{BZ_STREAM_END}
3304+ undefined
3305+ otherwise
3306+@end display
3307+
3308+Allowable next actions:
3309+@display
3310+ collect data from @code{buf}, then @code{BZ2_bzRead} or @code{BZ2_bzReadClose}
3311+ if @code{bzerror} is @code{BZ_OK}
3312+ collect data from @code{buf}, then @code{BZ2_bzReadClose} or @code{BZ2_bzReadGetUnused}
3313+ if @code{bzerror} is @code{BZ_SEQUENCE_END}
3314+ @code{BZ2_bzReadClose}
3315+ otherwise
3316+@end display
3317+
3318+
3319+
3320+@subsection @code{BZ2_bzReadGetUnused}
3321+@example
3322+ void BZ2_bzReadGetUnused ( int* bzerror, BZFILE *b,
3323+ void** unused, int* nUnused );
3324+@end example
3325+Returns data which was read from the compressed file but was not needed
3326+to get to the logical end-of-stream. @code{*unused} is set to the address
3327+of the data, and @code{*nUnused} to the number of bytes. @code{*nUnused} will
3328+be set to a value between @code{0} and @code{BZ_MAX_UNUSED} inclusive.
3329+
3330+This function may only be called once @code{BZ2_bzRead} has signalled
3331+@code{BZ_STREAM_END} but before @code{BZ2_bzReadClose}.
3332+
3333+Possible assignments to @code{bzerror}:
3334+@display
3335+ @code{BZ_PARAM_ERROR}
3336+ if @code{b} is @code{NULL}
3337+ or @code{unused} is @code{NULL} or @code{nUnused} is @code{NULL}
3338+ @code{BZ_SEQUENCE_ERROR}
3339+ if @code{BZ_STREAM_END} has not been signalled
3340+ or if @code{b} was opened with @code{BZ2_bzWriteOpen}
3341+ @code{BZ_OK}
3342+ otherwise
3343+@end display
3344+
3345+Allowable next actions:
3346+@display
3347+ @code{BZ2_bzReadClose}
3348+@end display
3349+
3350+
3351+@subsection @code{BZ2_bzReadClose}
3352+@example
3353+ void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
3354+@end example
3355+Releases all memory pertaining to the compressed file @code{b}.
3356+@code{BZ2_bzReadClose} does not call @code{fclose} on the underlying file
3357+handle, so you should do that yourself if appropriate.
3358+@code{BZ2_bzReadClose} should be called to clean up after all error
3359+situations.
3360+
3361+Possible assignments to @code{bzerror}:
3362+@display
3363+ @code{BZ_SEQUENCE_ERROR}
3364+ if @code{b} was opened with @code{BZ2_bzOpenWrite}
3365+ @code{BZ_OK}
3366+ otherwise
3367+@end display
3368+
3369+Allowable next actions:
3370+@display
3371+ none
3372+@end display
3373+
3374+
3375+
3376+@subsection @code{BZ2_bzWriteOpen}
3377+@example
3378+ BZFILE *BZ2_bzWriteOpen ( int *bzerror, FILE *f,
3379+ int blockSize100k, int verbosity,
3380+ int workFactor );
3381+@end example
3382+Prepare to write compressed data to file handle @code{f}.
3383+@code{f} should refer to
3384+a file which has been opened for writing, and for which the error
3385+indicator (@code{ferror(f)})is not set.
3386+
3387+For the meaning of parameters @code{blockSize100k},
3388+@code{verbosity} and @code{workFactor}, see
3389+@* @code{BZ2_bzCompressInit}.
3390+
3391+All required memory is allocated at this stage, so if the call
3392+completes successfully, @code{BZ_MEM_ERROR} cannot be signalled by a
3393+subsequent call to @code{BZ2_bzWrite}.
3394+
3395+Possible assignments to @code{bzerror}:
3396+@display
3397+ @code{BZ_CONFIG_ERROR}
3398+ if the library has been mis-compiled
3399+ @code{BZ_PARAM_ERROR}
3400+ if @code{f} is @code{NULL}
3401+ or @code{blockSize100k < 1} or @code{blockSize100k > 9}
3402+ @code{BZ_IO_ERROR}
3403+ if @code{ferror(f)} is nonzero
3404+ @code{BZ_MEM_ERROR}
3405+ if insufficient memory is available
3406+ @code{BZ_OK}
3407+ otherwise
3408+@end display
3409+
3410+Possible return values:
3411+@display
3412+ Pointer to an abstract @code{BZFILE}
3413+ if @code{bzerror} is @code{BZ_OK}
3414+ @code{NULL}
3415+ otherwise
3416+@end display
3417+
3418+Allowable next actions:
3419+@display
3420+ @code{BZ2_bzWrite}
3421+ if @code{bzerror} is @code{BZ_OK}
3422+ (you could go directly to @code{BZ2_bzWriteClose}, but this would be pretty pointless)
3423+ @code{BZ2_bzWriteClose}
3424+ otherwise
3425+@end display
3426+
3427+
3428+
3429+@subsection @code{BZ2_bzWrite}
3430+@example
3431+ void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
3432+@end example
3433+Absorbs @code{len} bytes from the buffer @code{buf}, eventually to be
3434+compressed and written to the file.
3435+
3436+Possible assignments to @code{bzerror}:
3437+@display
3438+ @code{BZ_PARAM_ERROR}
3439+ if @code{b} is @code{NULL} or @code{buf} is @code{NULL} or @code{len < 0}
3440+ @code{BZ_SEQUENCE_ERROR}
3441+ if b was opened with @code{BZ2_bzReadOpen}
3442+ @code{BZ_IO_ERROR}
3443+ if there is an error writing the compressed file.
3444+ @code{BZ_OK}
3445+ otherwise
3446+@end display
3447+
3448+
3449+
3450+
3451+@subsection @code{BZ2_bzWriteClose}
3452+@example
3453+ void BZ2_bzWriteClose ( int *bzerror, BZFILE* f,
3454+ int abandon,
3455+ unsigned int* nbytes_in,
3456+ unsigned int* nbytes_out );
3457+
3458+ void BZ2_bzWriteClose64 ( int *bzerror, BZFILE* f,
3459+ int abandon,
3460+ unsigned int* nbytes_in_lo32,
3461+ unsigned int* nbytes_in_hi32,
3462+ unsigned int* nbytes_out_lo32,
3463+ unsigned int* nbytes_out_hi32 );
3464+@end example
3465+
3466+Compresses and flushes to the compressed file all data so far supplied
3467+by @code{BZ2_bzWrite}. The logical end-of-stream markers are also written, so
3468+subsequent calls to @code{BZ2_bzWrite} are illegal. All memory associated
3469+with the compressed file @code{b} is released.
3470+@code{fflush} is called on the
3471+compressed file, but it is not @code{fclose}'d.
3472+
3473+If @code{BZ2_bzWriteClose} is called to clean up after an error, the only
3474+action is to release the memory. The library records the error codes
3475+issued by previous calls, so this situation will be detected
3476+automatically. There is no attempt to complete the compression
3477+operation, nor to @code{fflush} the compressed file. You can force this
3478+behaviour to happen even in the case of no error, by passing a nonzero
3479+value to @code{abandon}.
3480+
3481+If @code{nbytes_in} is non-null, @code{*nbytes_in} will be set to be the
3482+total volume of uncompressed data handled. Similarly, @code{nbytes_out}
3483+will be set to the total volume of compressed data written. For
3484+compatibility with older versions of the library, @code{BZ2_bzWriteClose}
3485+only yields the lower 32 bits of these counts. Use
3486+@code{BZ2_bzWriteClose64} if you want the full 64 bit counts. These
3487+two functions are otherwise absolutely identical.
3488+
3489+
3490+Possible assignments to @code{bzerror}:
3491+@display
3492+ @code{BZ_SEQUENCE_ERROR}
3493+ if @code{b} was opened with @code{BZ2_bzReadOpen}
3494+ @code{BZ_IO_ERROR}
3495+ if there is an error writing the compressed file
3496+ @code{BZ_OK}
3497+ otherwise
3498+@end display
3499+
3500+@subsection Handling embedded compressed data streams
3501+
3502+The high-level library facilitates use of
3503+@code{bzip2} data streams which form some part of a surrounding, larger
3504+data stream.
3505+@itemize @bullet
3506+@item For writing, the library takes an open file handle, writes
3507+compressed data to it, @code{fflush}es it but does not @code{fclose} it.
3508+The calling application can write its own data before and after the
3509+compressed data stream, using that same file handle.
3510+@item Reading is more complex, and the facilities are not as general
3511+as they could be since generality is hard to reconcile with efficiency.
3512+@code{BZ2_bzRead} reads from the compressed file in blocks of size
3513+@code{BZ_MAX_UNUSED} bytes, and in doing so probably will overshoot
3514+the logical end of compressed stream.
3515+To recover this data once decompression has
3516+ended, call @code{BZ2_bzReadGetUnused} after the last call of @code{BZ2_bzRead}
3517+(the one returning @code{BZ_STREAM_END}) but before calling
3518+@code{BZ2_bzReadClose}.
3519+@end itemize
3520+
3521+This mechanism makes it easy to decompress multiple @code{bzip2}
3522+streams placed end-to-end. As the end of one stream, when @code{BZ2_bzRead}
3523+returns @code{BZ_STREAM_END}, call @code{BZ2_bzReadGetUnused} to collect the
3524+unused data (copy it into your own buffer somewhere).
3525+That data forms the start of the next compressed stream.
3526+To start uncompressing that next stream, call @code{BZ2_bzReadOpen} again,
3527+feeding in the unused data via the @code{unused}/@code{nUnused}
3528+parameters.
3529+Keep doing this until @code{BZ_STREAM_END} return coincides with the
3530+physical end of file (@code{feof(f)}). In this situation
3531+@code{BZ2_bzReadGetUnused}
3532+will of course return no data.
3533+
3534+This should give some feel for how the high-level interface can be used.
3535+If you require extra flexibility, you'll have to bite the bullet and get
3536+to grips with the low-level interface.
3537+
3538+@subsection Standard file-reading/writing code
3539+Here's how you'd write data to a compressed file:
3540+@example @code
3541+FILE* f;
3542+BZFILE* b;
3543+int nBuf;
3544+char buf[ /* whatever size you like */ ];
3545+int bzerror;
3546+int nWritten;
3547+
3548+f = fopen ( "myfile.bz2", "w" );
3549+if (!f) @{
3550+ /* handle error */
3551+@}
3552+b = BZ2_bzWriteOpen ( &bzerror, f, 9 );
3553+if (bzerror != BZ_OK) @{
3554+ BZ2_bzWriteClose ( b );
3555+ /* handle error */
3556+@}
3557+
3558+while ( /* condition */ ) @{
3559+ /* get data to write into buf, and set nBuf appropriately */
3560+ nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
3561+ if (bzerror == BZ_IO_ERROR) @{
3562+ BZ2_bzWriteClose ( &bzerror, b );
3563+ /* handle error */
3564+ @}
3565+@}
3566+
3567+BZ2_bzWriteClose ( &bzerror, b );
3568+if (bzerror == BZ_IO_ERROR) @{
3569+ /* handle error */
3570+@}
3571+@end example
3572+And to read from a compressed file:
3573+@example
3574+FILE* f;
3575+BZFILE* b;
3576+int nBuf;
3577+char buf[ /* whatever size you like */ ];
3578+int bzerror;
3579+int nWritten;
3580+
3581+f = fopen ( "myfile.bz2", "r" );
3582+if (!f) @{
3583+ /* handle error */
3584+@}
3585+b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
3586+if (bzerror != BZ_OK) @{
3587+ BZ2_bzReadClose ( &bzerror, b );
3588+ /* handle error */
3589+@}
3590+
3591+bzerror = BZ_OK;
3592+while (bzerror == BZ_OK && /* arbitrary other conditions */) @{
3593+ nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
3594+ if (bzerror == BZ_OK) @{
3595+ /* do something with buf[0 .. nBuf-1] */
3596+ @}
3597+@}
3598+if (bzerror != BZ_STREAM_END) @{
3599+ BZ2_bzReadClose ( &bzerror, b );
3600+ /* handle error */
3601+@} else @{
3602+ BZ2_bzReadClose ( &bzerror );
3603+@}
3604+@end example
3605+
3606+
3607+
3608+@section Utility functions
3609+@subsection @code{BZ2_bzBuffToBuffCompress}
3610+@example
3611+ int BZ2_bzBuffToBuffCompress( char* dest,
3612+ unsigned int* destLen,
3613+ char* source,
3614+ unsigned int sourceLen,
3615+ int blockSize100k,
3616+ int verbosity,
3617+ int workFactor );
3618+@end example
3619+Attempts to compress the data in @code{source[0 .. sourceLen-1]}
3620+into the destination buffer, @code{dest[0 .. *destLen-1]}.
3621+If the destination buffer is big enough, @code{*destLen} is
3622+set to the size of the compressed data, and @code{BZ_OK} is
3623+returned. If the compressed data won't fit, @code{*destLen}
3624+is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
3625+
3626+Compression in this manner is a one-shot event, done with a single call
3627+to this function. The resulting compressed data is a complete
3628+@code{bzip2} format data stream. There is no mechanism for making
3629+additional calls to provide extra input data. If you want that kind of
3630+mechanism, use the low-level interface.
3631+
3632+For the meaning of parameters @code{blockSize100k}, @code{verbosity}
3633+and @code{workFactor}, @* see @code{BZ2_bzCompressInit}.
3634+
3635+To guarantee that the compressed data will fit in its buffer, allocate
3636+an output buffer of size 1% larger than the uncompressed data, plus
3637+six hundred extra bytes.
3638+
3639+@code{BZ2_bzBuffToBuffDecompress} will not write data at or
3640+beyond @code{dest[*destLen]}, even in case of buffer overflow.
3641+
3642+Possible return values:
3643+@display
3644+ @code{BZ_CONFIG_ERROR}
3645+ if the library has been mis-compiled
3646+ @code{BZ_PARAM_ERROR}
3647+ if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
3648+ or @code{blockSize100k < 1} or @code{blockSize100k > 9}
3649+ or @code{verbosity < 0} or @code{verbosity > 4}
3650+ or @code{workFactor < 0} or @code{workFactor > 250}
3651+ @code{BZ_MEM_ERROR}
3652+ if insufficient memory is available
3653+ @code{BZ_OUTBUFF_FULL}
3654+ if the size of the compressed data exceeds @code{*destLen}
3655+ @code{BZ_OK}
3656+ otherwise
3657+@end display
3658+
3659+
3660+
3661+@subsection @code{BZ2_bzBuffToBuffDecompress}
3662+@example
3663+ int BZ2_bzBuffToBuffDecompress ( char* dest,
3664+ unsigned int* destLen,
3665+ char* source,
3666+ unsigned int sourceLen,
3667+ int small,
3668+ int verbosity );
3669+@end example
3670+Attempts to decompress the data in @code{source[0 .. sourceLen-1]}
3671+into the destination buffer, @code{dest[0 .. *destLen-1]}.
3672+If the destination buffer is big enough, @code{*destLen} is
3673+set to the size of the uncompressed data, and @code{BZ_OK} is
3674+returned. If the compressed data won't fit, @code{*destLen}
3675+is unchanged, and @code{BZ_OUTBUFF_FULL} is returned.
3676+
3677+@code{source} is assumed to hold a complete @code{bzip2} format
3678+data stream. @* @code{BZ2_bzBuffToBuffDecompress} tries to decompress
3679+the entirety of the stream into the output buffer.
3680+
3681+For the meaning of parameters @code{small} and @code{verbosity},
3682+see @code{BZ2_bzDecompressInit}.
3683+
3684+Because the compression ratio of the compressed data cannot be known in
3685+advance, there is no easy way to guarantee that the output buffer will
3686+be big enough. You may of course make arrangements in your code to
3687+record the size of the uncompressed data, but such a mechanism is beyond
3688+the scope of this library.
3689+
3690+@code{BZ2_bzBuffToBuffDecompress} will not write data at or
3691+beyond @code{dest[*destLen]}, even in case of buffer overflow.
3692+
3693+Possible return values:
3694+@display
3695+ @code{BZ_CONFIG_ERROR}
3696+ if the library has been mis-compiled
3697+ @code{BZ_PARAM_ERROR}
3698+ if @code{dest} is @code{NULL} or @code{destLen} is @code{NULL}
3699+ or @code{small != 0 && small != 1}
3700+ or @code{verbosity < 0} or @code{verbosity > 4}
3701+ @code{BZ_MEM_ERROR}
3702+ if insufficient memory is available
3703+ @code{BZ_OUTBUFF_FULL}
3704+ if the size of the compressed data exceeds @code{*destLen}
3705+ @code{BZ_DATA_ERROR}
3706+ if a data integrity error was detected in the compressed data
3707+ @code{BZ_DATA_ERROR_MAGIC}
3708+ if the compressed data doesn't begin with the right magic bytes
3709+ @code{BZ_UNEXPECTED_EOF}
3710+ if the compressed data ends unexpectedly
3711+ @code{BZ_OK}
3712+ otherwise
3713+@end display
3714+
3715+
3716+
3717+@section @code{zlib} compatibility functions
3718+Yoshioka Tsuneo has contributed some functions to
3719+give better @code{zlib} compatibility. These functions are
3720+@code{BZ2_bzopen}, @code{BZ2_bzread}, @code{BZ2_bzwrite}, @code{BZ2_bzflush},
3721+@code{BZ2_bzclose},
3722+@code{BZ2_bzerror} and @code{BZ2_bzlibVersion}.
3723+These functions are not (yet) officially part of
3724+the library. If they break, you get to keep all the pieces.
3725+Nevertheless, I think they work ok.
3726+@example
3727+typedef void BZFILE;
3728+
3729+const char * BZ2_bzlibVersion ( void );
3730+@end example
3731+Returns a string indicating the library version.
3732+@example
3733+BZFILE * BZ2_bzopen ( const char *path, const char *mode );
3734+BZFILE * BZ2_bzdopen ( int fd, const char *mode );
3735+@end example
3736+Opens a @code{.bz2} file for reading or writing, using either its name
3737+or a pre-existing file descriptor.
3738+Analogous to @code{fopen} and @code{fdopen}.
3739+@example
3740+int BZ2_bzread ( BZFILE* b, void* buf, int len );
3741+int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
3742+@end example
3743+Reads/writes data from/to a previously opened @code{BZFILE}.
3744+Analogous to @code{fread} and @code{fwrite}.
3745+@example
3746+int BZ2_bzflush ( BZFILE* b );
3747+void BZ2_bzclose ( BZFILE* b );
3748+@end example
3749+Flushes/closes a @code{BZFILE}. @code{BZ2_bzflush} doesn't actually do
3750+anything. Analogous to @code{fflush} and @code{fclose}.
3751+
3752+@example
3753+const char * BZ2_bzerror ( BZFILE *b, int *errnum )
3754+@end example
3755+Returns a string describing the more recent error status of
3756+@code{b}, and also sets @code{*errnum} to its numerical value.
3757+
3758+
3759+@section Using the library in a @code{stdio}-free environment
3760+
3761+@subsection Getting rid of @code{stdio}
3762+
3763+In a deeply embedded application, you might want to use just
3764+the memory-to-memory functions. You can do this conveniently
3765+by compiling the library with preprocessor symbol @code{BZ_NO_STDIO}
3766+defined. Doing this gives you a library containing only the following
3767+eight functions:
3768+
3769+@code{BZ2_bzCompressInit}, @code{BZ2_bzCompress}, @code{BZ2_bzCompressEnd} @*
3770+@code{BZ2_bzDecompressInit}, @code{BZ2_bzDecompress}, @code{BZ2_bzDecompressEnd} @*
3771+@code{BZ2_bzBuffToBuffCompress}, @code{BZ2_bzBuffToBuffDecompress}
3772+
3773+When compiled like this, all functions will ignore @code{verbosity}
3774+settings.
3775+
3776+@subsection Critical error handling
3777+@code{libbzip2} contains a number of internal assertion checks which
3778+should, needless to say, never be activated. Nevertheless, if an
3779+assertion should fail, behaviour depends on whether or not the library
3780+was compiled with @code{BZ_NO_STDIO} set.
3781+
3782+For a normal compile, an assertion failure yields the message
3783+@example
3784+ bzip2/libbzip2: internal error number N.
3785+ This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000.
3786+ Please report it to me at: jseward@@acm.org. If this happened
3787+ when you were using some program which uses libbzip2 as a
3788+ component, you should also report this bug to the author(s)
3789+ of that program. Please make an effort to report this bug;
3790+ timely and accurate bug reports eventually lead to higher
3791+ quality software. Thanks. Julian Seward, 21 March 2000.
3792+@end example
3793+where @code{N} is some error code number. @code{exit(3)}
3794+is then called.
3795+
3796+For a @code{stdio}-free library, assertion failures result
3797+in a call to a function declared as:
3798+@example
3799+ extern void bz_internal_error ( int errcode );
3800+@end example
3801+The relevant code is passed as a parameter. You should supply
3802+such a function.
3803+
3804+In either case, once an assertion failure has occurred, any
3805+@code{bz_stream} records involved can be regarded as invalid.
3806+You should not attempt to resume normal operation with them.
3807+
3808+You may, of course, change critical error handling to suit
3809+your needs. As I said above, critical errors indicate bugs
3810+in the library and should not occur. All "normal" error
3811+situations are indicated via error return codes from functions,
3812+and can be recovered from.
3813+
3814+
3815+@section Making a Windows DLL
3816+Everything related to Windows has been contributed by Yoshioka Tsuneo
3817+@* (@code{QWF00133@@niftyserve.or.jp} /
3818+@code{tsuneo-y@@is.aist-nara.ac.jp}), so you should send your queries to
3819+him (but perhaps Cc: me, @code{jseward@@acm.org}).
3820+
3821+My vague understanding of what to do is: using Visual C++ 5.0,
3822+open the project file @code{libbz2.dsp}, and build. That's all.
3823+
3824+If you can't
3825+open the project file for some reason, make a new one, naming these files:
3826+@code{blocksort.c}, @code{bzlib.c}, @code{compress.c},
3827+@code{crctable.c}, @code{decompress.c}, @code{huffman.c}, @*
3828+@code{randtable.c} and @code{libbz2.def}. You will also need
3829+to name the header files @code{bzlib.h} and @code{bzlib_private.h}.
3830+
3831+If you don't use VC++, you may need to define the proprocessor symbol
3832+@code{_WIN32}.
3833+
3834+Finally, @code{dlltest.c} is a sample program using the DLL. It has a
3835+project file, @code{dlltest.dsp}.
3836+
3837+If you just want a makefile for Visual C, have a look at
3838+@code{makefile.msc}.
3839+
3840+Be aware that if you compile @code{bzip2} itself on Win32, you must set
3841+@code{BZ_UNIX} to 0 and @code{BZ_LCCWIN32} to 1, in the file
3842+@code{bzip2.c}, before compiling. Otherwise the resulting binary won't
3843+work correctly.
3844+
3845+I haven't tried any of this stuff myself, but it all looks plausible.
3846+
3847+
3848+
3849+@chapter Miscellanea
3850+
3851+These are just some random thoughts of mine. Your mileage may
3852+vary.
3853+
3854+@section Limitations of the compressed file format
3855+@code{bzip2-1.0}, @code{0.9.5} and @code{0.9.0}
3856+use exactly the same file format as the previous
3857+version, @code{bzip2-0.1}. This decision was made in the interests of
3858+stability. Creating yet another incompatible compressed file format
3859+would create further confusion and disruption for users.
3860+
3861+Nevertheless, this is not a painless decision. Development
3862+work since the release of @code{bzip2-0.1} in August 1997
3863+has shown complexities in the file format which slow down
3864+decompression and, in retrospect, are unnecessary. These are:
3865+@itemize @bullet
3866+@item The run-length encoder, which is the first of the
3867+ compression transformations, is entirely irrelevant.
3868+ The original purpose was to protect the sorting algorithm
3869+ from the very worst case input: a string of repeated
3870+ symbols. But algorithm steps Q6a and Q6b in the original
3871+ Burrows-Wheeler technical report (SRC-124) show how
3872+ repeats can be handled without difficulty in block
3873+ sorting.
3874+@item The randomisation mechanism doesn't really need to be
3875+ there. Udi Manber and Gene Myers published a suffix
3876+ array construction algorithm a few years back, which
3877+ can be employed to sort any block, no matter how
3878+ repetitive, in O(N log N) time. Subsequent work by
3879+ Kunihiko Sadakane has produced a derivative O(N (log N)^2)
3880+ algorithm which usually outperforms the Manber-Myers
3881+ algorithm.
3882+
3883+ I could have changed to Sadakane's algorithm, but I find
3884+ it to be slower than @code{bzip2}'s existing algorithm for
3885+ most inputs, and the randomisation mechanism protects
3886+ adequately against bad cases. I didn't think it was
3887+ a good tradeoff to make. Partly this is due to the fact
3888+ that I was not flooded with email complaints about
3889+ @code{bzip2-0.1}'s performance on repetitive data, so
3890+ perhaps it isn't a problem for real inputs.
3891+
3892+ Probably the best long-term solution,
3893+ and the one I have incorporated into 0.9.5 and above,
3894+ is to use the existing sorting
3895+ algorithm initially, and fall back to a O(N (log N)^2)
3896+ algorithm if the standard algorithm gets into difficulties.
3897+@item The compressed file format was never designed to be
3898+ handled by a library, and I have had to jump though
3899+ some hoops to produce an efficient implementation of
3900+ decompression. It's a bit hairy. Try passing
3901+ @code{decompress.c} through the C preprocessor
3902+ and you'll see what I mean. Much of this complexity
3903+ could have been avoided if the compressed size of
3904+ each block of data was recorded in the data stream.
3905+@item An Adler-32 checksum, rather than a CRC32 checksum,
3906+ would be faster to compute.
3907+@end itemize
3908+It would be fair to say that the @code{bzip2} format was frozen
3909+before I properly and fully understood the performance
3910+consequences of doing so.
3911+
3912+Improvements which I was able to incorporate into
3913+0.9.0, despite using the same file format, are:
3914+@itemize @bullet
3915+@item Single array implementation of the inverse BWT. This
3916+ significantly speeds up decompression, presumably
3917+ because it reduces the number of cache misses.
3918+@item Faster inverse MTF transform for large MTF values. The
3919+ new implementation is based on the notion of sliding blocks
3920+ of values.
3921+@item @code{bzip2-0.9.0} now reads and writes files with @code{fread}
3922+ and @code{fwrite}; version 0.1 used @code{putc} and @code{getc}.
3923+ Duh! Well, you live and learn.
3924+
3925+@end itemize
3926+Further ahead, it would be nice
3927+to be able to do random access into files. This will
3928+require some careful design of compressed file formats.
3929+
3930+
3931+
3932+@section Portability issues
3933+After some consideration, I have decided not to use
3934+GNU @code{autoconf} to configure 0.9.5 or 1.0.
3935+
3936+@code{autoconf}, admirable and wonderful though it is,
3937+mainly assists with portability problems between Unix-like
3938+platforms. But @code{bzip2} doesn't have much in the way
3939+of portability problems on Unix; most of the difficulties appear
3940+when porting to the Mac, or to Microsoft's operating systems.
3941+@code{autoconf} doesn't help in those cases, and brings in a
3942+whole load of new complexity.
3943+
3944+Most people should be able to compile the library and program
3945+under Unix straight out-of-the-box, so to speak, especially
3946+if you have a version of GNU C available.
3947+
3948+There are a couple of @code{__inline__} directives in the code. GNU C
3949+(@code{gcc}) should be able to handle them. If you're not using
3950+GNU C, your C compiler shouldn't see them at all.
3951+If your compiler does, for some reason, see them and doesn't
3952+like them, just @code{#define} @code{__inline__} to be @code{/* */}. One
3953+easy way to do this is to compile with the flag @code{-D__inline__=},
3954+which should be understood by most Unix compilers.
3955+
3956+If you still have difficulties, try compiling with the macro
3957+@code{BZ_STRICT_ANSI} defined. This should enable you to build the
3958+library in a strictly ANSI compliant environment. Building the program
3959+itself like this is dangerous and not supported, since you remove
3960+@code{bzip2}'s checks against compressing directories, symbolic links,
3961+devices, and other not-really-a-file entities. This could cause
3962+filesystem corruption!
3963+
3964+One other thing: if you create a @code{bzip2} binary for public
3965+distribution, please try and link it statically (@code{gcc -s}). This
3966+avoids all sorts of library-version issues that others may encounter
3967+later on.
3968+
3969+If you build @code{bzip2} on Win32, you must set @code{BZ_UNIX} to 0 and
3970+@code{BZ_LCCWIN32} to 1, in the file @code{bzip2.c}, before compiling.
3971+Otherwise the resulting binary won't work correctly.
3972+
3973+
3974+
3975+@section Reporting bugs
3976+I tried pretty hard to make sure @code{bzip2} is
3977+bug free, both by design and by testing. Hopefully
3978+you'll never need to read this section for real.
3979+
3980+Nevertheless, if @code{bzip2} dies with a segmentation
3981+fault, a bus error or an internal assertion failure, it
3982+will ask you to email me a bug report. Experience with
3983+version 0.1 shows that almost all these problems can
3984+be traced to either compiler bugs or hardware problems.
3985+@itemize @bullet
3986+@item
3987+Recompile the program with no optimisation, and see if it
3988+works. And/or try a different compiler.
3989+I heard all sorts of stories about various flavours
3990+of GNU C (and other compilers) generating bad code for
3991+@code{bzip2}, and I've run across two such examples myself.
3992+
3993+2.7.X versions of GNU C are known to generate bad code from
3994+time to time, at high optimisation levels.
3995+If you get problems, try using the flags
3996+@code{-O2} @code{-fomit-frame-pointer} @code{-fno-strength-reduce}.
3997+You should specifically @emph{not} use @code{-funroll-loops}.
3998+
3999+You may notice that the Makefile runs six tests as part of
4000+the build process. If the program passes all of these, it's
4001+a pretty good (but not 100%) indication that the compiler has
4002+done its job correctly.
4003+@item
4004+If @code{bzip2} crashes randomly, and the crashes are not
4005+repeatable, you may have a flaky memory subsystem. @code{bzip2}
4006+really hammers your memory hierarchy, and if it's a bit marginal,
4007+you may get these problems. Ditto if your disk or I/O subsystem
4008+is slowly failing. Yup, this really does happen.
4009+
4010+Try using a different machine of the same type, and see if
4011+you can repeat the problem.
4012+@item This isn't really a bug, but ... If @code{bzip2} tells
4013+you your file is corrupted on decompression, and you
4014+obtained the file via FTP, there is a possibility that you
4015+forgot to tell FTP to do a binary mode transfer. That absolutely
4016+will cause the file to be non-decompressible. You'll have to transfer
4017+it again.
4018+@end itemize
4019+
4020+If you've incorporated @code{libbzip2} into your own program
4021+and are getting problems, please, please, please, check that the
4022+parameters you are passing in calls to the library, are
4023+correct, and in accordance with what the documentation says
4024+is allowable. I have tried to make the library robust against
4025+such problems, but I'm sure I haven't succeeded.
4026+
4027+Finally, if the above comments don't help, you'll have to send
4028+me a bug report. Now, it's just amazing how many people will
4029+send me a bug report saying something like
4030+@display
4031+ bzip2 crashed with segmentation fault on my machine
4032+@end display
4033+and absolutely nothing else. Needless to say, a such a report
4034+is @emph{totally, utterly, completely and comprehensively 100% useless;
4035+a waste of your time, my time, and net bandwidth}.
4036+With no details at all, there's no way I can possibly begin
4037+to figure out what the problem is.
4038+
4039+The rules of the game are: facts, facts, facts. Don't omit
4040+them because "oh, they won't be relevant". At the bare
4041+minimum:
4042+@display
4043+ Machine type. Operating system version.
4044+ Exact version of @code{bzip2} (do @code{bzip2 -V}).
4045+ Exact version of the compiler used.
4046+ Flags passed to the compiler.
4047+@end display
4048+However, the most important single thing that will help me is
4049+the file that you were trying to compress or decompress at the
4050+time the problem happened. Without that, my ability to do anything
4051+more than speculate about the cause, is limited.
4052+
4053+Please remember that I connect to the Internet with a modem, so
4054+you should contact me before mailing me huge files.
4055+
4056+
4057+@section Did you get the right package?
4058+
4059+@code{bzip2} is a resource hog. It soaks up large amounts of CPU cycles
4060+and memory. Also, it gives very large latencies. In the worst case, you
4061+can feed many megabytes of uncompressed data into the library before
4062+getting any compressed output, so this probably rules out applications
4063+requiring interactive behaviour.
4064+
4065+These aren't faults of my implementation, I hope, but more
4066+an intrinsic property of the Burrows-Wheeler transform (unfortunately).
4067+Maybe this isn't what you want.
4068+
4069+If you want a compressor and/or library which is faster, uses less
4070+memory but gets pretty good compression, and has minimal latency,
4071+consider Jean-loup
4072+Gailly's and Mark Adler's work, @code{zlib-1.1.2} and
4073+@code{gzip-1.2.4}. Look for them at
4074+
4075+@code{http://www.cdrom.com/pub/infozip/zlib} and
4076+@code{http://www.gzip.org} respectively.
4077+
4078+For something faster and lighter still, you might try Markus F X J
4079+Oberhumer's @code{LZO} real-time compression/decompression library, at
4080+@* @code{http://wildsau.idv.uni-linz.ac.at/mfx/lzo.html}.
4081+
4082+If you want to use the @code{bzip2} algorithms to compress small blocks
4083+of data, 64k bytes or smaller, for example on an on-the-fly disk
4084+compressor, you'd be well advised not to use this library. Instead,
4085+I've made a special library tuned for that kind of use. It's part of
4086+@code{e2compr-0.40}, an on-the-fly disk compressor for the Linux
4087+@code{ext2} filesystem. Look at
4088+@code{http://www.netspace.net.au/~reiter/e2compr}.
4089+
4090+
4091+
4092+@section Testing
4093+
4094+A record of the tests I've done.
4095+
4096+First, some data sets:
4097+@itemize @bullet
4098+@item B: a directory containing 6001 files, one for every length in the
4099+ range 0 to 6000 bytes. The files contain random lowercase
4100+ letters. 18.7 megabytes.
4101+@item H: my home directory tree. Documents, source code, mail files,
4102+ compressed data. H contains B, and also a directory of
4103+ files designed as boundary cases for the sorting; mostly very
4104+ repetitive, nasty files. 565 megabytes.
4105+@item A: directory tree holding various applications built from source:
4106+ @code{egcs}, @code{gcc-2.8.1}, KDE, GTK, Octave, etc.
4107+ 2200 megabytes.
4108+@end itemize
4109+The tests conducted are as follows. Each test means compressing
4110+(a copy of) each file in the data set, decompressing it and
4111+comparing it against the original.
4112+
4113+First, a bunch of tests with block sizes and internal buffer
4114+sizes set very small,
4115+to detect any problems with the
4116+blocking and buffering mechanisms.
4117+This required modifying the source code so as to try to
4118+break it.
4119+@enumerate
4120+@item Data set H, with
4121+ buffer size of 1 byte, and block size of 23 bytes.
4122+@item Data set B, buffer sizes 1 byte, block size 1 byte.
4123+@item As (2) but small-mode decompression.
4124+@item As (2) with block size 2 bytes.
4125+@item As (2) with block size 3 bytes.
4126+@item As (2) with block size 4 bytes.
4127+@item As (2) with block size 5 bytes.
4128+@item As (2) with block size 6 bytes and small-mode decompression.
4129+@item H with buffer size of 1 byte, but normal block
4130+ size (up to 900000 bytes).
4131+@end enumerate
4132+Then some tests with unmodified source code.
4133+@enumerate
4134+@item H, all settings normal.
4135+@item As (1), with small-mode decompress.
4136+@item H, compress with flag @code{-1}.
4137+@item H, compress with flag @code{-s}, decompress with flag @code{-s}.
4138+@item Forwards compatibility: H, @code{bzip2-0.1pl2} compressing,
4139+ @code{bzip2-0.9.5} decompressing, all settings normal.
4140+@item Backwards compatibility: H, @code{bzip2-0.9.5} compressing,
4141+ @code{bzip2-0.1pl2} decompressing, all settings normal.
4142+@item Bigger tests: A, all settings normal.
4143+@item As (7), using the fallback (Sadakane-like) sorting algorithm.
4144+@item As (8), compress with flag @code{-1}, decompress with flag
4145+ @code{-s}.
4146+@item H, using the fallback sorting algorithm.
4147+@item Forwards compatibility: A, @code{bzip2-0.1pl2} compressing,
4148+ @code{bzip2-0.9.5} decompressing, all settings normal.
4149+@item Backwards compatibility: A, @code{bzip2-0.9.5} compressing,
4150+ @code{bzip2-0.1pl2} decompressing, all settings normal.
4151+@item Misc test: about 400 megabytes of @code{.tar} files with
4152+ @code{bzip2} compiled with Checker (a memory access error
4153+ detector, like Purify).
4154+@item Misc tests to make sure it builds and runs ok on non-Linux/x86
4155+ platforms.
4156+@end enumerate
4157+These tests were conducted on a 225 MHz IDT WinChip machine, running
4158+Linux 2.0.36. They represent nearly a week of continuous computation.
4159+All tests completed successfully.
4160+
4161+
4162+@section Further reading
4163+@code{bzip2} is not research work, in the sense that it doesn't present
4164+any new ideas. Rather, it's an engineering exercise based on existing
4165+ideas.
4166+
4167+Four documents describe essentially all the ideas behind @code{bzip2}:
4168+@example
4169+Michael Burrows and D. J. Wheeler:
4170+ "A block-sorting lossless data compression algorithm"
4171+ 10th May 1994.
4172+ Digital SRC Research Report 124.
4173+ ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
4174+ If you have trouble finding it, try searching at the
4175+ New Zealand Digital Library, http://www.nzdl.org.
4176+
4177+Daniel S. Hirschberg and Debra A. LeLewer
4178+ "Efficient Decoding of Prefix Codes"
4179+ Communications of the ACM, April 1990, Vol 33, Number 4.
4180+ You might be able to get an electronic copy of this
4181+ from the ACM Digital Library.
4182+
4183+David J. Wheeler
4184+ Program bred3.c and accompanying document bred3.ps.
4185+ This contains the idea behind the multi-table Huffman
4186+ coding scheme.
4187+ ftp://ftp.cl.cam.ac.uk/users/djw3/
4188+
4189+Jon L. Bentley and Robert Sedgewick
4190+ "Fast Algorithms for Sorting and Searching Strings"
4191+ Available from Sedgewick's web page,
4192+ www.cs.princeton.edu/~rs
4193+@end example
4194+The following paper gives valuable additional insights into the
4195+algorithm, but is not immediately the basis of any code
4196+used in bzip2.
4197+@example
4198+Peter Fenwick:
4199+ Block Sorting Text Compression
4200+ Proceedings of the 19th Australasian Computer Science Conference,
4201+ Melbourne, Australia. Jan 31 - Feb 2, 1996.
4202+ ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
4203+@end example
4204+Kunihiko Sadakane's sorting algorithm, mentioned above,
4205+is available from:
4206+@example
4207+http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
4208+@end example
4209+The Manber-Myers suffix array construction
4210+algorithm is described in a paper
4211+available from:
4212+@example
4213+http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
4214+@end example
4215+Finally, the following paper documents some recent investigations
4216+I made into the performance of sorting algorithms:
4217+@example
4218+Julian Seward:
4219+ On the Performance of BWT Sorting Algorithms
4220+ Proceedings of the IEEE Data Compression Conference 2000
4221+ Snowbird, Utah. 28-30 March 2000.
4222+@end example
4223+
4224+
4225+@contents
4226+
4227+@bye
4228+
4229diff -Nru bzip2-1.0.1/doc/bzip2recover.1 bzip2-1.0.1.new/doc/bzip2recover.1
4230--- bzip2-1.0.1/doc/bzip2recover.1 Thu Jan 1 01:00:00 1970
4231+++ bzip2-1.0.1.new/doc/bzip2recover.1 Sat Jun 24 20:13:06 2000
4232@@ -0,0 +1 @@
4233+.so bzip2.1
4234\ No newline at end of file
4235diff -Nru bzip2-1.0.1/doc/pl/Makefile.am bzip2-1.0.1.new/doc/pl/Makefile.am
4236--- bzip2-1.0.1/doc/pl/Makefile.am Thu Jan 1 01:00:00 1970
4237+++ bzip2-1.0.1.new/doc/pl/Makefile.am Sat Jun 24 20:13:06 2000
4238@@ -0,0 +1,4 @@
4239+
4240+mandir = @mandir@/pl
4241+man_MANS = bzip2.1 bunzip2.1 bzcat.1 bzip2recover.1
4242+
4243diff -Nru bzip2-1.0.1/doc/pl/bunzip2.1 bzip2-1.0.1.new/doc/pl/bunzip2.1
4244--- bzip2-1.0.1/doc/pl/bunzip2.1 Thu Jan 1 01:00:00 1970
4245+++ bzip2-1.0.1.new/doc/pl/bunzip2.1 Sat Jun 24 20:13:06 2000
4246@@ -0,0 +1 @@
4247+.so bzip2.1
4248\ No newline at end of file
4249diff -Nru bzip2-1.0.1/doc/pl/bzcat.1 bzip2-1.0.1.new/doc/pl/bzcat.1
4250--- bzip2-1.0.1/doc/pl/bzcat.1 Thu Jan 1 01:00:00 1970
4251+++ bzip2-1.0.1.new/doc/pl/bzcat.1 Sat Jun 24 20:13:06 2000
4252@@ -0,0 +1 @@
4253+.so bzip2.1
4254\ No newline at end of file
4255diff -Nru bzip2-1.0.1/doc/pl/bzip2.1 bzip2-1.0.1.new/doc/pl/bzip2.1
4256--- bzip2-1.0.1/doc/pl/bzip2.1 Thu Jan 1 01:00:00 1970
4257+++ bzip2-1.0.1.new/doc/pl/bzip2.1 Sat Jun 24 20:13:06 2000
4258@@ -0,0 +1,384 @@
4259