1 --- wget-1.5.0/doc/wget.1.man Wed Apr 22 12:36:21 1998
2 +++ wget-1.5.0/doc/wget.1 Wed Apr 22 12:36:21 1998
7 +.TH wget 1 "1996 Nov 11" Wget
9 +wget \- a utility to retrieve files from the World Wide Web
11 +.B "wget [options] [URL-list]"
13 +The information in this man page is an extract from the full
16 +It is well out of date. Please refer to the info page for full,
17 +up\-to\-date documentation. You can view the info documentation with
18 +the Emacs info subsystem or the standalone info program.
21 +is a utility designed for retrieving binary documents across the Web,
22 +through the use of \fIHTTP\fP (Hyper Text Transfer Protocol) and
23 +\fIFTP\fP (File Transfer Protocol), and saving them to disk.
25 +is non\-interactive, which means it can work in the background, while
26 +the user is not logged in, unlike most of web browsers (thus you may
27 +start the program and log off, letting it do its work). Analysing
28 +server responses, it distinguishes between correctly and incorrectly
29 +retrieved documents, and retries retrieving them as many times as
30 +necessary, or until a user\-specified limit is reached. \fIREST\fP is
31 +used in \fIFTP\fP on hosts that support it. Proxy servers are
32 +supported to speed up the retrieval and lighten network load.
35 +supports a full-featured recursion mechanism, through which you can
36 +retrieve large parts of the web, creating local copies of remote
37 +directory hierarchies. Of course, maximum level of recursion and other
38 +parameters can be specified. Infinite recursion loops are always
39 +avoided by hashing the retrieved data. All of this works for both
40 +\fIHTTP\fP and \fIFTP\fP.
42 +The retrieval is conveniently traced with printing dots, each dot
43 +representing one kilobyte of data received. Builtin features offer
44 +mechanisms to tune which links you wish to follow (cf. -L, -D and -H).
46 +.SH "URL CONVENTIONS"
48 +Most of the URL conventions described in RFC1738 are supported. Two
49 +alternative syntaxes are also supported, which means you can use three
50 +forms of address to specify a file:
52 +Normal URL (recommended form):
54 +http://host[:port]/path
55 +http://fly.cc.fer.hr/
56 +ftp://ftp.xemacs.org/pub/xemacs/xemacs-19.14.tar.gz
57 +ftp://username:password@host/dir/file
60 +\fIFTP\fP only (ncftp-like):
64 +\fIHTTP\fP only (netscape-like):
65 +hostname(:port)/dir/file
68 +You may encode your username and/or password to URL using the form:
71 +ftp://user:password@host/dir/file
74 +If you do not understand these syntaxes, just use the plain ordinary
75 +syntax with which you would call \fIlynx\fP or \fInetscape\fP. Note
76 +that the alternative forms are deprecated, and may cease being
77 +supported in the future.
81 +There are quite a few command\-line options for
83 +Note that you do not have to know or to use them unless you wish to
84 +change the default behaviour of the program. For simple operations you
85 +need no options at all. It is also a good idea to put frequently used
86 +command\-line options in .wgetrc, where they can be stored in a more
89 +This is the complete list of options with descriptions, sorted in
90 +descending order of importance:
92 +Print a help screen. You will also get help if you do not supply
93 +command\-line arguments.
100 +Verbose output, with all the available data. The default output
101 +consists only of saving updates and error messages. If the output is
102 +stdout, verbose is default.
105 +Quiet mode, with no output at all.
108 +Debug output, and will work only if
110 +was compiled with -DDEBUG. Note that when the program is compiled with
111 +debug output, it is not printed unless you specify -d.
113 +.IP "-i \fIfilename\fP --input-file=\fIfilename\fP"
116 +in which case no URL\-s need to be on the command line. If there are
117 +URL\-s both on the command line and in a filename, those on the
118 +command line are first to be retrieved. The filename need not be an
119 +\fIHTML\fP document (but no harm if it is) - it is enough if the URL-s
120 +are just listed sequentially.
122 +However, if you specify --force-html, the document will be regarded as
123 +\fIHTML\fP. In that case you may have problems with relative links,
124 +which you can solve either by adding <base href="url"> to the document
125 +or by specifying --base=url on the command\-line.
127 +.IP "-o \fIlogfile\fP --output-file=\fIlogfile\fP"
128 +Log messages to \fIlogfile\fP, instead of default stdout. Verbose
129 +output is now the default at logfiles. If you do not wish it, use \-nv
132 +.IP "-a \fIlogfile\fP --append-output=\fIlogfile\fP"
133 +Append to logfile - same as -o, but appends to a logfile (or creating
134 +a new one if the old does not exist) instead of rewriting the old log
137 +.IP "-t \fInum\fP --tries=\fInum\fP"
138 +Set number of retries to
140 +Specify 0 for infinite retrying.
143 +Follow \fIFTP\fP links from \fIHTML\fP documents.
145 +.IP "-c --continue-ftp"
146 +Continue retrieval of FTP documents, from where it was left off. If
147 +you specify "wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z", and there
148 +is already a file named ls-lR.Z in the current directory,
150 +continue retrieval from the offset equal to the length of the existing
151 +file. Note that you do not need to specify this option if the only
154 +to continue retrieving where it left off when the connection is lost -
156 +does this by default. You need this option when you want to continue
157 +retrieval of a file already halfway retrieved, saved by other FTP
158 +software, or left by
159 +.I wget being killed.
161 +.IP "-g \fIon/off\fP --glob=\fIon/off\fP"
162 +Turn FTP globbing on or off. By default, globbing will be turned on if
163 +the URL contains a globbing characters (an asterisk, e.g.). Globbing
164 +means you may use the special characters (wildcards) to retrieve more
165 +files from the same directory at once, like wget
166 +ftp://gnjilux.cc.fer.hr/*.msg. Globbing currently works only on UNIX FTP
169 +.IP "-e \fIcommand\fP --execute=\fIcommand\fP"
170 +Execute \fIcommand\fP, as if it were a part of .wgetrc file. A
171 +command invoked this way will take precedence over the same command
172 +in .wgetrc, if there is one.
174 +.IP "-N --timestamping"
175 +Use the so\-called time\-stamps to determine whether to retrieve a
176 +file. If the last\-modification date of the remote file is equal to,
177 +or older than that of local file, and the sizes of files are equal,
178 +the remote file will not be retrieved. This option is useful for
183 +sites, since it will not permit downloading of the same file twice.
185 +.IP "-F --force-html"
186 +When input is read from a file, force it to be \fIHTML\fP. This
187 +enables you to retrieve relative links from existing \fIHTML\fP files
188 +on your local disk, by adding <base href> to \fIHTML\fP, or using
191 +.IP "-B \fIbase_href\fP --base=\fIbase_href\fP"
192 +Use \fIbase_href\fP as base reference, as if it were in the file, in
193 +the form <base href="base_href">. Note that the base in the file will
194 +take precedence over the one on the command\-line.
196 +.IP "-r --recursive"
197 +Recursive web\-suck. According to the protocol of the URL, this can
198 +mean two things. Recursive retrieval of a \fIHTTP\fP URL means that
200 +will download the URL you want, parse it as an \fIHTML\fP document (if
201 +an \fIHTML\fP document it is), and retrieve the files this document is
202 +referring to, down to a certain depth (default 5; change it with -l).
204 +will create a hierarchy of directories locally, corresponding to the
205 +one found on the \fIHTTP\fP server.
207 +This option is ideal for presentations, where slow connections should
208 +be bypassed. The results will be especially good if relative links
209 +were used, since the pages will then work on the new location without
212 +When using this option with an \fIFTP\fP URL, it will retrieve all the
213 +data from the given directory and subdirectories, similar to
214 +\fIHTTP\fP recursive retrieval.
216 +You should be warned that invoking this option may cause grave
217 +overloading of your connection. The load can be minimized by lowering
218 +the maximal recursion level (see -l) and/or by lowering the number of
222 +Turn on mirroring options. This will set recursion and time\-stamping,
223 +combining \-r and \-N.
225 +.IP "-l \fIdepth\fP --level=\fIdepth\fP"
226 +Set recursion depth level to the specified level. Default is 5.
227 +After the given recursion level is reached, the sucking will proceed
228 +from the parent. Thus specifying -r -l1 should equal a recursion-less
229 +retrieve from file. Setting the level to zero makes recursion depth
230 +(theoretically) unlimited. Note that the number of retrieved documents
231 +will increase exponentially with the depth level.
233 +.IP "-H --span-hosts"
234 +Enable spanning across hosts when doing recursive retrieving. See
237 +for a more detailed description.
240 +Follow only relative links. Useful for retrieving a specific homepage
241 +without any distractions, not even those from the same host. Refer to
243 +for a more detailed description.
245 +.IP "-D \fIdomain\-list\fP --domains=\fIdomain\-list\fP"
246 +Set domains to be accepted and DNS looked-up, where domain\-list is a
247 +comma\-separated list. Note that it does not turn on -H. This speeds
248 +things up, even if only one host is spanned. Refer to
250 +for a more detailed description.
252 +.IP "-A \fIacclist\fP / -R \fIrejlist\fP --accept=\fIacclist\fP / --reject=\fIrejlist\fP"
253 +Comma\-separated list of extensions to accept/reject. For example, if
254 +you wish to download only GIFs and JPEGs, you will use -A gif,jpg,jpeg.
255 +If you wish to download everything except cumbersome MPEGs and .AU
256 +files, you will use -R mpg,mpeg,au.
257 +.IP "-X list --exclude-directories list"
258 +Comma\-separated list of directories to exclude from FTP fetching.
260 +.IP "-P \fIprefix\fP --directory-prefix=\fIprefix\fP"
261 +Set directory prefix ("." by default) to
262 +\fIprefix\fP. The directory prefix is the directory where all other
263 +files and subdirectories will be saved to.
265 +.IP "-T \fIvalue\fP --timeout=\fIvalue\fP"
266 +Set the read timeout to a specified value. Whenever a read is issued,
267 +the file descriptor is checked for a possible timeout, which could
268 +otherwise leave a pending connection (uninterrupted read). The default
269 +timeout is 900 seconds (fifteen minutes).
271 +.IP "-Y \fIon/off\fP --proxy=\fIon/off\fP"
272 +Turn proxy on or off. The proxy is on by default if the appropriate
273 +environmental variable is defined.
275 +.IP "-Q \fIquota[KM]\fP --quota=\fIquota[KM]\fP"
276 +Specify download quota, in bytes (default), kilobytes or
277 +megabytes. More useful for rc file. See below.
279 +.IP "-O filename --output-document=filename"
280 +The documents will not be written to the appropriate files, but all
281 +will be appended to a unique file name specified by this option. The
282 +number of tries will be automatically set to 1. If this filename is
283 +`-', the documents will be written to stdout, and --quiet will be
284 +turned on. Use this option with caution, since it turns off all the
287 +can otherwise give about various errors.
289 +.IP "-S --server-response"
290 +Print the headers sent by the \fIHTTP\fP server and/or responses sent
291 +by the \fIFTP\fP server.
293 +.IP "-s --save-headers"
294 +Save the headers sent by the \fIHTTP\fP server to the file, before the
297 +.IP "--header=additional-header"
298 +Define an additional header. You can define more than additional
299 +headers. Do not try to terminate the header with CR or LF.
301 +.IP "--http-user --http-passwd"
302 +Use these two options to set username and password
304 +will send to \fIHTTP\fP servers. Wget supports only the basic
305 +WWW authentication scheme.
308 +Do not clobber existing files when saving to directory hierarchy
309 +within recursive retrieval of several files. This option is
311 +useful when you wish to continue where you left off with retrieval.
312 +If the files are .html or (yuck) .htm, it will be loaded from
313 +the disk, and parsed as if they have been retrieved from the Web.
316 +Non\-verbose \- turn off verbose without being completely quiet (use
317 +-q for that), which means that error messages and basic information
321 +Do not create a hierarchy of directories when retrieving
322 +recursively. With this option turned on, all files will get
323 +saved to the current directory, without clobbering (if
324 +a name shows up more than once, the filenames will get
328 +The opposite of \-nd \-\- Force creation of a hierarchy of directories
329 +even if it would not have been done otherwise.
332 +Disable time-consuming DNS lookup of almost all hosts. Refer to
334 +for a more detailed description.
337 +Disable host-prefixed directories. By default, http://fly.cc.fer.hr/
338 +will produce a directory named fly.cc.fer.hr in which everything else
339 +will go. This option disables such behaviour.
342 +Do not ascend to parent directory.
344 +.IP "-k --convert-links"
345 +Convert the non-relative links to relative ones locally.
347 +.SH "FOLLOWING LINKS"
348 +Recursive retrieving has a mechanism that allows you to specify which
352 +.IP "Only relative links"
353 +When only relative links are followed (option -L), recursive
354 +retrieving will never span hosts.
356 +will never get called, and the process will be very fast, with the
357 +minimum strain of the network. This will suit your needs most of the
358 +time, especially when mirroring the output the output of *2html
359 +converters, which generally produce only relative links.
362 +The drawback of following the relative links solely is that humans
363 +often tend to mix them with absolute links to the very same host,
364 +and the very same page. In this mode (which is the default), all
365 +URL-s that refer to the same host will be retrieved.
367 +The problem with this options are the aliases of the hosts and domains.
368 +Thus there is no way for
370 +to know that \fBregoc.srce.hr\fP and \fBwww.srce.hr\fP are the same
371 +hosts, or that \fBfly.cc.fer.hr\fP is the same as \fBfly.cc.etf.hr\fP.
372 +Whenever an absolute link is encountered, \fBgethostbyname\fP is
373 +called to check whether we are really on the same host. Although
374 +results of \fBgethostbyname\fP are hashed, so that it will never get
375 +called twice for the same host, it still presents a nuisance e.g. in
376 +the large indexes of difference hosts, when each of them has to be
377 +looked up. You can use -nh to prevent such complex checking, and then
379 +will just compare the hostname. Things will run much faster, but
380 +also much less reliable.
382 +.IP "Domain acceptance"
383 +With the -D option you may specify domains that will be followed.
384 +The nice thing about this option is that hosts that are not from
385 +those domains will not get DNS-looked up. Thus you may specify
387 +.B "just to make sure that nothing outside .mit.edu gets looked up".
388 +This is very important and useful. It also means that -D does
389 +\fBnot\fP imply -H (it must be explicitly specified). Feel free to use
390 +this option, since it will speed things up greatly, with almost all
391 +the reliability of host checking of all hosts.
393 +Of course, domain acceptance can be used to limit the retrieval to
394 +particular domains, but freely spanning hosts within the domain,
395 +but then you must explicitly specify -H.
398 +When -H is specified without -D, all hosts are being spanned. It is
399 +useful to set the recursion level to a small value in those cases.
400 +Such option is rarely useful.
405 +are somewhat specific, since they have to be. To have
409 +documents, you must specify -f (follow_ftp). If you do specify it,
411 +links will be able to span hosts even if span_hosts is not set.
412 +Option relative_only (-L) has no effect on
414 +However, domain acceptance (-D) and suffix rules (-A/-R) still apply.
418 +supports the use of initialization file
420 +First a system-wide init file will be looked for
421 +(/usr/local/lib/wgetrc by default) and loaded. Then the user's file
422 +will be searched for in two places: In the environmental variable
423 +\fIWGETRC\fP (which is presumed to hold the full pathname) and
425 +Note that the settings in user's startup file may override the system
426 +settings, which includes the quota settings (he he).
428 +The syntax of each line of startup file is simple:
430 + \fIvariable\fP = \fIvalue\fP
432 +Valid values are different for different variables. The complete set
433 +of commands is listed below, the letter after equation\-sign denoting
434 +the value the command takes. It is \fBon/off\fP for \fBon\fP or
435 +\fBoff\fP (which can also be \fB1\fP or \fB0\fP), \fBstring\fP for any
436 +string or \fBN\fP for positive integer. For example, you may specify
437 +"use_proxy = off" to disable use of proxy servers by default. You may
438 +use \fBinf\fP for infinite value (the role of \fB0\fP on the command
439 +line), where appropriate. The commands are case\-insensitive and
440 +underscore\-insensitive, thus \fBDIr__Prefix\fP is the same as
441 +\fBdirprefix\fP. Empty lines, lines consisting of spaces, or lines
442 +beginning with '#' are skipped.
444 +Most of the commands have their equivalent command\-line option,
445 +except some more obscure or rarely used ones. A sample init file is
446 +provided in the distribution, named \fIsample.wgetrc\fP.
448 +.IP "accept/reject = \fBstring\fP"
450 +.IP "add_hostdir = \fBon/off\fP"
451 +Enable/disable host-prefixed hostnames. -nH disables it.
452 +.IP "always_rest = \fBon/off\fP"
453 +Enable/disable continuation of the retrieval, the same as -c.
454 +.IP "base = \fBstring\fP"
455 +Set base for relative URL-s, the same as -B.
456 +.IP "convert links = \fBon/off\fP"
457 +Convert non-relative links locally. The same as -k.
458 +.IP "debug = \fBon/off\fP"
459 +Debug mode, same as -d.
460 +.IP "dir_mode = \fBN\fP"
461 +Set permission modes of created subdirectories (default is 755).
462 +.IP "dir_prefix = \fBstring\fP"
463 +Top of directory tree, the same as -P.
464 +.IP "dirstruct = \fBon/off\fP"
465 +Turning dirstruct on or off, the same as -x or -nd, respectively.
466 +.IP "domains = \fBstring\fP"
468 +.IP "follow_ftp = \fBon/off\fP"
473 +documents, the same as -f.
474 +.IP "force_html = \fBon/off\fP"
475 +If set to on, force the input filename to be regarded as an HTML
476 +document, the same as -F.
477 +.IP "ftp_proxy = \fBstring\fP"
478 +Use the string as \fIFTP\fP proxy, instead of the one specified in
480 +.IP "glob = \fBon/off\fP"
481 +Turn globbing on/off, the same as -g.
482 +.IP "header = \fBstring\fP"
483 +Define an additional header, like --header.
484 +.IP "http_passwd = \fBstring\fP"
485 +Set \fIHTTP\fP password.
486 +.IP "http_proxy = \fBstring\fP"
487 +Use the string as \fIHTTP\fP proxy, instead of the one specified in
489 +.IP "http_user = \fBstring\fP"
490 +Set \fIHTTP\fP user.
491 +.IP "input = \fBstring\fP"
492 +Read the URL-s from filename, like -i.
493 +.IP "kill_longer = \fBon/off\fP"
494 +Consider data longer than specified in content-length header
495 +as invalid (and retry getting it). The default behaviour is to save
496 +as much data as there is, provided there is more than or equal
497 +to the value in content-length.
498 +.IP "logfile = \fBstring\fP"
499 +Set logfile, the same as -o.
500 +.IP "login = \fBstring\fP"
501 +Your user name on the remote machine, for
503 +Defaults to "anonymous".
504 +.IP "mirror = \fBon/off\fP"
505 +Turn mirroring on/off. The same as -m.
506 +.IP "noclobber = \fBon/off\fP"
508 +.IP "no_parent = \fBon/off\fP"
509 +Same as --no-parent.
510 +.IP "no_proxy = \fBstring\fP"
511 +Use the string as the comma\-separated list of domains to avoid in
512 +proxy loading, instead of the one specified in environment.
513 +.IP "num_tries = \fBN\fP"
514 +Set number of retries per URL, the same as -t.
515 +.IP "output_document = \fBstring\fP"
516 +Set the output filename, the same as -O.
517 +.IP "passwd = \fBstring\fP"
518 +Your password on the remote machine, for
521 +username@hostname.domainname.
522 +.IP "quiet = \fBon/off\fP"
523 +Quiet mode, the same as -q.
524 +.IP "quota = \fBquota\fP"
525 +Specify the download quota, which is useful to put in
526 +/usr/local/lib/wgetrc. When download quota is specified,
528 +will stop retrieving after the download sum has become greater than
529 +quota. The quota can be specified in bytes (default), kbytes ('k'
530 +appended) or mbytes ('m' appended). Thus "quota = 5m" will set the
531 +quota to 5 mbytes. Note that the user's startup file overrides system
533 +.IP "reclevel = \fBN\fP"
534 +Recursion level, the same as -l.
535 +.IP "recursive = \fBon/off\fP"
536 +Recursive on/off, the same as -r.
537 +.IP "relative_only = \fBon/off\fP"
538 +Follow only relative links (the same as -L). Refer to section
539 +.I "FOLLOWING LINKS"
540 +for a more detailed description.
541 +.IP "robots = \fBon/off\fP"
542 +Use (or not) robots.txt file.
543 +.IP "server_response = \fBon/off\fP"
544 +Choose whether or not to print the \fIHTTP\fP and \fIFTP\fP server
545 +responses, the same as -S.
546 +.IP "simple_host_check = \fBon/off\fP"
548 +.IP "span_hosts = \fBon/off\fP"
550 +.IP "timeout = \fBN\fP"
551 +Set timeout value, the same as -T.
552 +.IP "timestamping = \fBon/off\fP"
553 +Turn timestamping on/off. The same as -N.
554 +.IP "use_proxy = \fBon/off\fP"
555 +Turn proxy support on/off. The same as -Y.
556 +.IP "verbose = \fBon/off\fP"
557 +Turn verbose on/off, the same as -v/-nv.
562 +will catch the \fISIGHUP\fP (hangup signal) and ignore it. If the
563 +output was on stdout, it will be redirected to a file named
564 +\fIwget-log\fP. This is also convenient when you wish to redirect
565 +the output of \fIWget\fP interactively.
569 +$ wget http://www.ifi.uio.no/~larsi/gnus.tar.gz &
570 +$ kill -HUP %% # to redirect the output
574 +\fIWget\fP will not try to handle any signals other than
575 +\fISIGHUP\fP. Thus you may interrupt \fIWget\fP using ^C or
580 +Get URL http://fly.cc.fer.hr/:
582 +wget http://fly.cc.fer.hr/
585 +Force non\-verbose output:
587 +wget -nv http://fly.cc.fer.hr/
590 +Unlimit number of retries:
592 +wget -t0 http://www.yahoo.com/
595 +Create a mirror image of fly's web (with the same directory structure
596 +the original has), up to six recursion levels, with only one try per
597 +document, saving the verbose output to log file 'log':
600 +wget -r -l6 -t1 -o log http://fly.cc.fer.hr/
603 +Retrieve from yahoo host only (depth 50):
605 +wget -r -l50 http://www.yahoo.com/
616 +.IR /usr/local/lib/wgetrc,
622 +is free; anyone may redistribute copies of
624 +to anyone under the terms stated in the General Public License, a copy
625 +of which accompanies each copy of
634 +Hrvoje Niksic <hniksic@srce.hr> is the author of Wget. Thanks to the
635 +beta testers and all the other people who helped with useful
638 --- wget-1.5.0/doc/Makefile.in.man Tue Mar 31 09:19:47 1998
639 +++ wget-1.5.0/doc/Makefile.in Wed Apr 22 12:36:21 1998
643 # install all the documentation
644 -install: install.info install.wgetrc # install.man
645 +install: install.info install.wgetrc install.man
647 # uninstall all the documentation
648 uninstall: uninstall.info # uninstall.man
652 # install man page, creating install directory if necessary
654 -# $(top_srcdir)/mkinstalldirs $(mandir)/man$(manext)
655 -# $(INSTALL_DATA) $(srcdir)/$(MAN) $(mandir)/man$(manext)/$(MAN)
657 + $(top_srcdir)/mkinstalldirs $(mandir)/man$(manext)
658 + $(INSTALL_DATA) $(srcdir)/$(MAN) $(mandir)/man$(manext)/$(MAN)
660 # install sample.wgetrc
663 $(RM) $(infodir)/wget.info*
667 -# $(RM) $(mandir)/man$(manext)/$(MAN)
669 + $(RM) $(mandir)/man$(manext)/$(MAN)
672 # Dependencies for cleanup
673 --- wget-1.5.0/Makefile.in.man Wed Apr 22 12:40:14 1998
674 +++ wget-1.5.0/Makefile.in Wed Apr 22 12:40:55 1998
676 cd $@ && $(MAKE) $(MAKEDEFS)
679 -install: install.bin install.info install.wgetrc install.mo # install.man
680 +install: install.bin install.info install.wgetrc install.mo install.man
682 # install/uninstall the binary
683 install.bin uninstall.bin: