Back to Unix-like Utilities

See Also GNU ToolchainHTTPcurl

wget

GNU Wget是一个在网络上进行下载的简单而强大的自由软件,其本身也是GNU计划的一部分。它的名字是“World Wide Web”和“Get”的结合,同时也隐含了软件的主要功能。目前它支持通过HTTP、HTTPS,以及FTP这三个最常见的TCP/IP协议协议下载。

1. Introduction to GNU Wget

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.

GNU Wget has many features to make retrieving large files or mirroring entire web or FTP sites easy, including:

2. Documentation

https://www.gnu.org/software/wget/manual/

2.1. Installation

$ wget http://ftp.gnu.org/gnu/wget/wget-1.18.tar.gz
$ $ tar xzf wget-1.18.tar.gz 
$ cd wget-1.18
# yum --enablerepo="epel" install gnutls openssl -y
$ ./configure --with-ssl=openssl --with-libssl-prefix=/usr/local/ssl
$ make
# make install

2.2. Startup File

https://www.gnu.org/software/wget/manual/html_node/Startup-File.html

➜  /tmp cat headers.txt 
content_disposition = on
trust_server_names = on
check_certificate = off
header = User-Agent: Mozilla/5.0...
header = Cookie: SCRIPTSESSID...
➜  /tmp wget --config=headers.txt https://file.li3huo.com/customdown/3057be34-405c-ca91-f0f1-bed04b7deeaf
HTTP request sent, awaiting response... 200 OK
Length: 568778 (555K) [application/octet-stream]
Saving to: '1305004-257.pdf'

3. Using Wget

3.1. debugging & quiet mode

       --debug
           Turn on debug output, meaning various information important
           to the developers of Wget if it does not work properly.  Your
           system administrator may have chosen to compile Wget without
           debug support, in which case -d will not work.  Please note
           that compiling with debug support is always safe---Wget
           compiled with the debug support will not print any debug info
           unless requested with -d.

       -q
       --quiet
           Turn off Wget's output.

       -v
       --verbose
           Turn on verbose output, with all the available data.  The
           default output is verbose.

       -nv
       --no-verbose
           Turn off verbose without being completely quiet (use -q for
           that), which means that error messages and basic information
           still get printed.

3.2. Redirect Output to the Terminal

wget -q -O - "$@" url

3.3. POST

       --post-data=string
       --post-file=file
           Use POST as the method for all HTTP requests and send the
           specified data in the request body.  --post-data sends string
           as data, whereas --post-file sends the contents of file.
           Other than that, they work in exactly the same way. In
           particular, they both expect content of the form
           "key1=value1&key2=value2", with percent-encoding for special
           characters; the only difference is that one expects its
           content as a command-line parameter and the other accepts its
           content from a file. In particular, --post-file is not for
           transmitting files as form attachments: those must appear as
           "key=value" data (with appropriate percent-coding) just like
           everything else. Wget does not currently support
           "multipart/form-data" for transmitting POST data; only
           "application/x-www-form-urlencoded". Only one of --post-data
           and --post-file should be specified.

   wget --save-cookies cookies.txt \
        --post-data 'user=foo&password=bar' \
        http://server.com/auth.php

   # Now grab the page or pages we care about.
   wget --load-cookies cookies.txt \
        -p http://server.com/interesting/article.php

3.5. wget a file with correct name

http://superuser.com/questions/301044/how-to-wget-a-file-with-correct-name-when-redirected

       --content-disposition   honor the Content-Disposition header when

3.6. proxy

       http_proxy
       https_proxy
           If set, the http_proxy and https_proxy variables should
           contain the URLs of the proxies for HTTP and HTTPS
           connections respectively.

       ftp_proxy
           This variable should contain the URL of the proxy for FTP
           connections.  It is quite common that http_proxy and
           ftp_proxy are set to the same URL.

       no_proxy
           This variable should contain a comma-separated list of domain
           extensions proxy should not be used for.  For instance, if
           the value of no_proxy is .mit.edu, proxy will not be used to
           retrieve documents from MIT.

4. Reference


CategoryTool

MainWiki: wget (last edited 2011-03-11 15:33:58 by twotwo)