NAME

httpget - download web pages


SYNOPSIS

httpget [options] [URLs...]


DESCRIPTION

httpget downloads or reports the specified URLs with an optional degree of indirection. It is pretty handy for downloading the contents of a direcory or reporting the URLs in a web page.


OPTIONS

-#

Strip the #anchor component of reported URLs, if any.

-a URL

Resolve relative URLs into absolute URLs with respect to supplied base URL.

-b baselength

Keep the last baselength components of a URL for use as the local pathname The default is 1, which means all downloads land in the local directory.

-d depth

Indirect depth times. The default is 0, which means the supplied URLs are downloaded. A value of 1 means the URLs those pages reference are downloaded, and so forth.

-D delay

Delay delay seconds between a successful download and commencement of the next fetch. The default is 5 seconds.

-f

Force: overwrite existing files. Normally httpget will skip files already present, an optimisation usually useful on recursive fetches. This options forces a fetch and overwrite even if the file exists.

-g regexp

Grep: download only URLs matching the Perl regexp. This is applied only to final URLs, not intermediate URLs used during indirection. The default is the content of the $HTTPGET_PATTERN environment variable.

-h

Include the MIME headers in the downloaded file.

-H

Do a HEAD request for the leaf fetches, thus returning only the headers of the target page. This implies the -h option.

-i

Inline: at the final level of indirection, instead of using URLs attached to HREF= attributes use URLs attached to SRC= and BACKGROUND= attributes. This is typically used to collect inline images.

-I idepth

Add an ``ur-depth'' idepth to the indirection depth. The reason for this option in addition to the -d option is obtuse, but see the pageurls(1) command for an example use.

-l lock

Lock the named lock during each subfetch. This can be used to ensure a link isn't swamped.

-L lock

Lock the named lock for the entire httpget run. This can be used to ensure a items are fetched in chunks or to ensure the -D delay code actually causes laid back fetching with competing httpgets.

-n

No action. Report final download URLs instead of downloading these pages. Generally used for table-of-contents operations.

-o output

Save the downloaded page as the file output. The name ``-'' means standard output.

-q queue

Write actual URLs to be saved to the specified file queue.

-Q queue

Monitor the specified file queue as a queue of pending downloads.

-s

Silent. Omit all warning messages.

-t

Print title strings with reported URLs, separated by a TAB character. Implies the -n option.

-v

Verbose. Report warnings and progress messages.

-W eee,...

Warning suppress. Be quiet about the specified HTTP error codes.

-x

Hexify (percent escape) reported URLs. Implies the -n option.

URLs...

Fetch the named URLs. The URL ``-'' causes httpget to read URLs from standard input, one per line. Also, if no URLs are supplied on the command line then URLs are read from standard input, one per line.


ENVIRONMENT

HTTPGET_PATTERN Regexp to select download URLs.

WEBPROXY Web proxy setting of the form host:port.


SEE ALSO

htv(1), urls(1), pageurls(1), getpageurls(1), watchlinkpages(1), wget(1), curl(1), lynx(1)


AUTHOR

Cameron Simpson <cs@zip.com.au>