GNU Wget是一个在网络上进行下载的简单而强大的自由软件，其本身也是GNU计划的一部分。它的名字是“World Wide Web”和“Get”的结合，同时也隐含了软件的主要功能。目前它支持通过HTTP、HTTPS，以及FTP这三个最常见的TCP/IP协议协议下载。
1. Introduction to GNU Wget
GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
GNU Wget has many features to make retrieving large files or mirroring entire web or FTP sites easy, including:
- Can resume aborted downloads, using REST and RANGE
- Can use filename wild cards and recursively mirror directories
- NLS-based message files for many different languages
- Optionally converts absolute links in downloaded documents to relative, so that downloaded documents may link to each other locally
- Runs on most UNIX-like operating systems as well as Microsoft Windows
- Supports HTTP proxies
- Supports HTTP cookies
- Supports persistent HTTP connections
- Unattended / background operation
- Uses local file timestamps to determine whether documents need to be re-downloaded when mirroring
- GNU Wget is distributed under the GNU General Public License.
$ wget http://ftp.gnu.org/gnu/wget/wget-1.18.tar.gz $ $ tar xzf wget-1.18.tar.gz $ cd wget-1.18 # yum --enablerepo="epel" install gnutls openssl -y $ ./configure --with-ssl=openssl --with-libssl-prefix=/usr/local/ssl $ make # make install
2.2. Startup File
➜ /tmp cat headers.txt content_disposition = on trust_server_names = on check_certificate = off header = User-Agent: Mozilla/5.0... header = Cookie: SCRIPTSESSID... ➜ /tmp wget --config=headers.txt https://file.li3huo.com/customdown/3057be34-405c-ca91-f0f1-bed04b7deeaf HTTP request sent, awaiting response... 200 OK Length: 568778 (555K) [application/octet-stream] Saving to: '1305004-257.pdf'
3. Using Wget
3.1. debugging & quiet mode
--debug Turn on debug output, meaning various information important to the developers of Wget if it does not work properly. Your system administrator may have chosen to compile Wget without debug support, in which case -d will not work. Please note that compiling with debug support is always safe---Wget compiled with the debug support will not print any debug info unless requested with -d. -q --quiet Turn off Wget's output. -v --verbose Turn on verbose output, with all the available data. The default output is verbose. -nv --no-verbose Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed.
3.2. Redirect Output to the Terminal
wget -q -O - "$@" url
--post-data=string --post-file=file Use POST as the method for all HTTP requests and send the specified data in the request body. --post-data sends string as data, whereas --post-file sends the contents of file. Other than that, they work in exactly the same way. In particular, they both expect content of the form "key1=value1&key2=value2", with percent-encoding for special characters; the only difference is that one expects its content as a command-line parameter and the other accepts its content from a file. In particular, --post-file is not for transmitting files as form attachments: those must appear as "key=value" data (with appropriate percent-coding) just like everything else. Wget does not currently support "multipart/form-data" for transmitting POST data; only "application/x-www-form-urlencoded". Only one of --post-data and --post-file should be specified.
3.4. with cookie
wget --save-cookies cookies.txt \ --post-data 'user=foo&password=bar' \ http://server.com/auth.php # Now grab the page or pages we care about. wget --load-cookies cookies.txt \ -p http://server.com/interesting/article.php
3.5. wget a file with correct name
--content-disposition honor the Content-Disposition header when
http_proxy https_proxy If set, the http_proxy and https_proxy variables should contain the URLs of the proxies for HTTP and HTTPS connections respectively. ftp_proxy This variable should contain the URL of the proxy for FTP connections. It is quite common that http_proxy and ftp_proxy are set to the same URL. no_proxy This variable should contain a comma-separated list of domain extensions proxy should not be used for. For instance, if the value of no_proxy is .mit.edu, proxy will not be used to retrieve documents from MIT.