httpget - download web pages
httpget [options] [URLs...]
httpget downloads or reports the specified URLs with an optional degree of indirection. It is pretty handy for downloading the contents of a direcory or reporting the URLs in a web page.
Strip the #anchor component of reported URLs, if any.
Resolve relative URLs into absolute URLs with respect to supplied base URL.
Keep the last baselength components of a URL for use as the local pathname The default is 1, which means all downloads land in the local directory.
Indirect depth times. The default is 0, which means the supplied URLs are downloaded. A value of 1 means the URLs those pages reference are downloaded, and so forth.
Delay delay seconds between a successful download and commencement of the next fetch. The default is 5 seconds.
Force: overwrite existing files. Normally httpget will skip files already present, an optimisation usually useful on recursive fetches. This options forces a fetch and overwrite even if the file exists.
Grep: download only URLs matching the Perl regexp. This is applied only to final URLs, not intermediate URLs used during indirection. The default is the content of the $HTTPGET_PATTERN environment variable.
Include the MIME headers in the downloaded file.
Do a HEAD request for the leaf fetches, thus returning only the headers of the target page. This implies the -h option.
Inline: at the final level of indirection, instead of using URLs attached to HREF= attributes use URLs attached to SRC= and BACKGROUND= attributes. This is typically used to collect inline images.
Add an ``ur-depth'' idepth
to the indirection depth.
The reason for this option in addition to the -d option
is obtuse,
but see the pageurls(1) command for an example use.
Lock the named lock during each subfetch. This can be used to ensure a link isn't swamped.
Lock the named lock for the entire httpget run. This can be used to ensure a items are fetched in chunks or to ensure the -D delay code actually causes laid back fetching with competing httpgets.
No action. Report final download URLs instead of downloading these pages. Generally used for table-of-contents operations.
Save the downloaded page as the file output. The name ``-'' means standard output.
Write actual URLs to be saved to the specified file queue.
Monitor the specified file queue as a queue of pending downloads.
Silent. Omit all warning messages.
Print title strings with reported URLs, separated by a TAB character. Implies the -n option.
Verbose. Report warnings and progress messages.
Warning suppress. Be quiet about the specified HTTP error codes.
Hexify (percent escape) reported URLs. Implies the -n option.
Fetch the named URLs. The URL ``-'' causes httpget to read URLs from standard input, one per line. Also, if no URLs are supplied on the command line then URLs are read from standard input, one per line.
HTTPGET_PATTERN Regexp to select download URLs.
WEBPROXY Web proxy setting of the form host:port.
htv(1), urls(1), pageurls(1), getpageurls(1), watchlinkpages(1),
wget(1), curl(1), lynx(1)
Cameron Simpson <cs@zip.com.au>