Linux命令：wget_wget 递归层数

作者：菜鸟追梦旅行 | 2024-03-17 14:43:52

踩

wget 递归层数

简介

   GNU Wget是一个在网络上进行下载的简单而强大的自由软件，其本身也是GNU计划的一部分。它的名字是
 “World Wide Web”和“Get”的结合，同时也隐含了软件的主要功能。目前它支持通过HTTP、HTTPS，以及
 FTP这三个最常见的TCP/IP协议协议下载。
                                                                      --wikipedia
1
2
3
4

特点

1. 支持递归下载
2. 恰当的转换页面中的链接
3. 生成可在本地浏览的页面镜像
4. 支持代理服务器1
2
3
4

缺点

1. 支持的协议较少，特别是cURL相比。流行的流媒体协议mms和rtsp没有得到支持，还有广泛使用各种的P2P协议也没有涉及。
2. 支持协议过老。目前HTTP还是使用1.0版本，而HTML中通过JavaScript和CSS引用的文件不能下载。
3. 灵活性不强，扩展性不高。面对复杂的镜像站会出现问题。
4. 命令过于复杂，可选的设置项有上百个。
5. 安全问题。1
2
3
4
5

详解

操作系统：Ubuntu14.04

wget --help
GNU Wget 1.15, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
用法：wget [参数] [URL地址]
Mandatory arguments to long options are mandatory for short options too.

Startup:
  -V,  --version           display the version of Wget and exit.
                           显示wget的版本信息然后退出
  -h,  --help              print this help.
                           打印帮助信息
  -b,  --background        go to background after startup.
                           启动后后台执行
  -e,  --execute=COMMAND   execute a `.wgetrc'-style command.
                           执行时.wgetrc格式的命令

Logging and input file:
  -o,  --output-file=FILE    log messages to FILE.
                             记录信息到FILE文件
  -a,  --append-output=FILE  append messages to FILE.
                             追加信息到FILE文件末尾
  -d,  --debug               print lots of debugging information.
                             打印大量debug信息
  -q,  --quiet               quiet (no output).
                             安静模式（没有输出信息）
  -v,  --verbose             be verbose (this is the default).
                             冗长模式（缺省设置）
  -nv, --no-verbose          turn off verboseness, without being quiet.
                             关闭冗长模式，但不是安静模式
       --report-speed=TYPE   Output bandwidth as TYPE.  TYPE can be bits.
                             以TYPE为输出带宽单位，TYPE可以是位
  -i,  --input-file=FILE     download URLs found in local or external FILE.
                             在本地或外部文件中找到下载的地址
  -F,  --force-html          treat input file as HTML.
                             把输入文件当做html格式来解析
  -B,  --base=URL            resolves HTML input-file links (-i -F)
                             relative to URL.
                             将URL做前缀连接到-i -F指定的文件中的地址
       --config=FILE         Specify config file to use.
                             指定配置文件

Download:
  -t,  --tries=NUMBER            set number of retries to NUMBER (0 unlimits).
                                 设置下载重试次数为NUMBER（0 表示无限重试）
       --retry-connrefused       retry even if connection is refused.
                                 下载请求被拒绝后仍然重试
  -O,  --output-document=FILE    write documents to FILE.
                                 下载内容写入文件FILE
  -nc, --no-clobber              skip downloads that would download to
                                 existing files (overwriting them).
                                 跳过下载已存在的文件
  -c,  --continue                resume getting a partially-downloaded file.
                                 继续下载未完成的文件
       --progress=TYPE           select progress gauge type.
                                 设置下载进程条标记
  -N,  --timestamping            don't re-retrieve files unless newer than
                                 local.
                                 文件有更新才下载
  --no-use-server-timestamps     don't set the local file's timestamp by
                                 the one on the server.
                                 不使用服务器上的时间戳
  -S,  --server-response         print server response.
                                 打印服务的的回应
       --spider                  don't download anything.
                                 什么也不下载
  -T,  --timeout=SECONDS         set all timeout values to SECONDS.
                                 设定响应超时时间（超时时间单位均为秒）
       --dns-timeout=SECS        set the DNS lookup timeout to SECS.
                                 设定dns解析超时时间
       --connect-timeout=SECS    set the connect timeout to SECS.
                                 设定连接超时时间
       --read-timeout=SECS       set the read timeout to SECS.
                                 设定读取超时时间
  -w,  --wait=SECONDS            wait SECONDS between retrievals.
                                 设定两次尝试连接之间间隔时间
       --waitretry=SECONDS       wait 1..SECONDS between retries of a retrieval.
       --random-wait             wait from 0.5*WAIT...1.5*WAIT secs between retrievals.
       --no-proxy                explicitly turn off proxy.
                                 关闭代理
  -Q,  --quota=NUMBER            set retrieval quota to NUMBER.
                                 设置下载的容量
       --bind-address=ADDRESS    bind to ADDRESS (hostname or IP) on local host.
       --limit-rate=RATE         limit download rate to RATE.
                                 设置下载速度
       --no-dns-cache            disable caching DNS lookups.
                                 不换存dsn查找记录
       --restrict-file-names=OS  restrict chars in file names to ones OS allows.

       --ignore-case             ignore case when matching files/directories.
                                 匹配文件或目录时忽略缓存
  -4,  --inet4-only              connect only to IPv4 addresses.
                                 只连接IPv4地址
  -6,  --inet6-only              connect only to IPv6 addresses.
                                 只连接IPv6地址
       --prefer-family=FAMILY    connect first to addresses of specified family,
                                 one of IPv6, IPv4, or none.
                                 优先连接地址类型（IPv6、IPv4或者其它）
       --user=USER               set both ftp and http user to USER.
                                 设置ftp和http协议的用户
       --password=PASS           set both ftp and http password to PASS.
                                 设置ftp和http协议的用户密码
       --ask-password            prompt for passwords.
                                 设置密码提示
       --no-iri                  turn off IRI support.
                                 关闭IRI支持
       --local-encoding=ENC      use ENC as the local encoding for IRIs.
                                 使用ENC作为本地IRIS编码方式
       --remote-encoding=ENC     use ENC as the default remote encoding.
                                 使用ENC作为远程仓库默认编码
       --unlink                  remove file before clobber.
                                 忽略链接文件

Directories:
  -nd, --no-directories           don't create directories.
                                  不创建目录
  -x,  --force-directories        force creation of directories.
                                  强制创建目录
  -nH, --no-host-directories      don't create host directories.
                                  不创建主机目录
       --protocol-directories     use protocol name in directories.
                                  使用协议名称做目录
  -P,  --directory-prefix=PREFIX  save files to PREFIX/...
                                  保存文件到PREFIX目录下
       --cut-dirs=NUMBER          ignore NUMBER remote directory components.
                                  忽略NUMBER层之下的目录

HTTP options:
       --http-user=USER        set http user to USER.
                               设置http用户
       --http-password=PASS    set http password to PASS.
                               设置http用户密码
       --no-cache              disallow server-cached data.
                               不允许服务端缓存数据
       --default-page=NAME     Change the default page name (normally
                               this is `index.html'.).
                               更改默认获取页面（默认index.html）
  -E,  --adjust-extension      save HTML/CSS documents with proper extensions.
                               自适应保存HTML/CSS文件后缀
       --ignore-length         ignore `Content-Length' header field.
                               忽略http头部的Content-Length
       --header=STRING         insert STRING among the headers.
                               把STRING插入http头部
       --max-redirect          maximum redirections allowed per page.
                               每个页面最多重定向次数
       --proxy-user=USER       set USER as proxy username.
                               设置代理用户名
       --proxy-password=PASS   set PASS as proxy password.
                               设置代理用户密码
       --referer=URL           include `Referer: URL' header in HTTP request.
                               在http头部保存'Referer: URL'
       --save-headers          save the HTTP headers to file.
                               把http头部写入文件
  -U,  --user-agent=AGENT      identify as AGENT instead of Wget/VERSION.
                               设置代理
       --no-http-keep-alive    disable HTTP keep-alive (persistent connections).
                               不允许http长连接
       --no-cookies            don't use cookies.
                               不使用cookie
       --load-cookies=FILE     load cookies from FILE before session.
                               在会话开始前加载指定文件的cookie
       --save-cookies=FILE     save cookies to FILE after session.
                               会话结束保存cookie到指定文件
       --keep-session-cookies  load and save session (non-permanent) cookies.
                               加载和保存session，cookie(非永久)
       --post-data=STRING      use the POST method; send STRING as the data.
                               post方式发送字符串
       --post-file=FILE        use the POST method; send contents of FILE.
                               post方式发送文件
       --method=HTTPMethod     use method "HTTPMethod" in the header.
                               在http头部设置请求方式（get,post..）
       --body-data=STRING      Send STRING as data. --method MUST be set.
                               发送字符串，配合--method选项使用
       --body-file=FILE        Send contents of FILE. --method MUST be set.
                               发送文件，配合--method方法使用
       --content-disposition   honor the Content-Disposition header when
                               choosing local file names (EXPERIMENTAL).
                               当选中本地文件名时允许 Content-Disposition 头部(尚在实验)
       --content-on-error      output the received content on server errors.
                               输出服务端返回的错误信息
       --auth-no-challenge     send Basic HTTP authentication information
                               without first waiting for the server's
                               challenge.
                               发送不含服务器询问的首次等待的基本 HTTP 验证信息

HTTPS (SSL/TLS) options:
       --secure-protocol=PR     choose secure protocol, one of auto, SSLv2,
                                SSLv3, TLSv1 and PFS.
                                选择安全协议
       --https-only             only follow secure HTTPS links
                                只使用https连接
       --no-check-certificate   don't validate the server's certificate.
                                不验证服务器的证书
       --certificate=FILE       client certificate file.
                                客户端证书
       --certificate-type=TYPE  client certificate type, PEM or DER.
                                客户端证书类型
       --private-key=FILE       private key file.
                                私钥
       --private-key-type=TYPE  private key type, PEM or DER.
                                私钥类型
       --ca-certificate=FILE    file with the bundle of CA's.
                                CA 认证的文件
       --ca-directory=DIR       directory where hash list of CA's is stored.
                                保存CA认证文件的目录
       --random-file=FILE       file with random data for seeding the SSL PRNG.
                                SSL PRNG 生成的随机数据的文件
       --egd-file=FILE          file naming the EGD socket with random data.
                                用于命名带有随机数据的 EGD 套接字的文件
FTP options:
       --ftp-user=USER         set ftp user to USER.
                               设置ftp用户
       --ftp-password=PASS     set ftp password to PASS.
                               设置ftp用户密码
       --no-remove-listing     don't remove `.listing' files.
                               保留.listing文件
       --no-glob               turn off FTP file name globbing.
                               关闭FTP文件名通配符
       --no-passive-ftp        disable the "passive" transfer mode.
                               禁用"passive"传输模式
       --preserve-permissions  preserve remote file permissions.
                               保留远程文件的权限
       --retr-symlinks         when recursing, get linked-to files (not dir).
                               递归目录时，获取链接的文件(而非目录)

Recursive download:
  -r,  --recursive          specify recursive download.
                            指定递归下载文件名
  -l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
                            递归的层数（inf或0 表示无限制）
       --delete-after       delete files locally after downloading them.
                            下载完成后删除本地文件
  -k,  --convert-links      make links in downloaded HTML or CSS point to
                            local files.
                            转换链接到本地文件
       --backups=N          before writing file X,rotate up to N backup files.
  -K,  --backup-converted   before converting file X, back up as X.orig.
                            在转换文件 X 前先将它备份为 X.orig
  -m,  --mirror             shortcut for -N -r -l inf --no-remove-listing.
                            -N -r -l inf --no-remove-listing 的缩写形式
  -p,  --page-requisites    get all images, etc. needed to display HTML page.
                            下载所有用于显示 HTML 页面的图片之类的元素。
       --strict-comments    turn on strict (SGML) handling of HTML comments.
                            开启 HTML 注释的精确处理(SGML)

Recursive accept/reject:
  -A,  --accept=LIST               comma-separated list of accepted extensions.
                                   逗号分隔的可接受的扩展名列表
  -R,  --reject=LIST               comma-separated list of rejected extensions.
                                   逗号分隔的要拒绝的扩展名列表
       --accept-regex=REGEX        regex matching accepted URLs.
                                   可接受URL的正则表达式
       --reject-regex=REGEX        regex matching rejected URLs.
                                   不接受URL的正则表达式
       --regex-type=TYPE           regex type (posix).
                                   正则表达式类型
  -D,  --domains=LIST              comma-separated list of accepted domains.
                                   逗号分隔的可接受的域列表
       --exclude-domains=LIST      comma-separated list of rejected domains.
                                   逗号分隔的要拒绝的域列表
       --follow-ftp                follow FTP links from HTML documents.
                                   下载HTML中的ftp链接
       --follow-tags=LIST          comma-separated list of followed HTML tags.
                                   逗号分隔的跟踪的 HTML 标识列表
       --ignore-tags=LIST          comma-separated list of ignored HTML tags.
                                   逗号分隔的忽略的 HTML 标识列表
  -H,  --span-hosts                go to foreign hosts when recursive.
                                   递归时转向外部主机
  -L,  --relative                  follow relative links only.
                                   只跟踪有关系的链接
  -I,  --include-directories=LIST  list of allowed directories.
                                   可接受的目录列表
  --trust-server-names             use the name specified by the redirection
                                   url last component.
                                   用url重定向的最后一部分作为文件名
  -X,  --exclude-directories=LIST  list of excluded directories.
                                   不接受的目录的列表
  -np, --no-parent                 don't ascend to the parent directory.
                                   不追溯至父目录

Mail bug reports and suggestions to <bug-wget@gnu.org>.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282

常用实例

实例1：使用wget下载单个文件

wget http://www.minjieren.com/wordpress-3.1-zh_CN.zip1

实例2：使用wget -O下载并以不同的文件名保存

wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=10801

实例3：使用wget –limit -rate限速下载

wget --limit-rate=300k http://www.minjieren.com/wordpress-3.1-zh_CN.zip1

实例4：使用wget -c断点续传

wget -c http://www.minjieren.com/wordpress-3.1-zh_CN.zip1

实例5：使用wget -b后台下载

wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip1

实例6：伪装代理名称下载

wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) 
AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" 
http://www.minjieren.com/wordpress-3.1-zh_CN.zip
有些网站能通过根据判断代理名称不是浏览器而拒绝你的下载请求。不过你可以通过–user-agent
参数伪装。1
2
3
4
5

实例7：使用wget –tries增加重试次数

wget --tries=40 URL1

实例8：使用wget -i下载多个文件

wget -i filelist.txt
首先，保存一份下载链接文件
cat  filelist.txt
url1
url2
url3
url4
接着使用这个文件和参数-i下载1
2
3
4
5
6
7
8

实例9：使用wget –mirror镜像网站

wget --mirror -p --convert-links -P ./LOCAL URL
下载整个网站到本地。
–miror:开户镜像下载
-p:下载所有为了html页面显示正常的文件
–convert-links:下载后，转换成本地的链接
-P ./LOCAL：保存所有文件和目录到本地指定目录1
2
3
4
5
6

参考：
http://www.cnblogs.com/peida/archive/2013/03/18/2965369.html
http://blog.chinaunix.net/uid-25324849-id-3198560.html

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/菜鸟追梦旅行/article/detail/256591