Wget download all files. Russian-language documentation for Ubuntu


In this article I want to talk about installing and using the GNU Wget console utility on the Windows operating system.

Wget's capabilities are not limited to just downloading archives; the utility can create local copies of websites with full preservation of the directory and file structure. In addition, it is possible to convert saved html files to view the site offline. By reading file headers and comparing them with previously downloaded files, Wget can download new versions of files, allowing you to create updated mirror sites.

Wget can work via HTTP, HTTPS, FTP protocols, and also supports working through an HTTP proxy server. The utility was developed for slow connections; more precisely, in those days connections were slow and unstable, so it supports resuming files when the connection is lost. If the server from which the file was downloaded also supports resuming, then Wget will continue to download the file exactly from the point where the download was interrupted.

To install Wget, download the installation exe file. Run the exe file and install Wget as a regular program; by default, the utility is installed in C:\Program Files (x86)\GnuWin32.

To launch the utility, you need the Windows command line, launch it through the Start menu or with the Win+R key combination, enter “cmd” in the window that opens and press Enter. After launching the command line, you won’t be able to immediately use Wget, as happens in Linux OS; first you need to specify the location of the wget.exe file.

The wget.exe file is located in the bin directory, inside the installation directory. The path command is used to specify a file.

Now you can run Wget, first you can view the output of the help command with a list of additional options.

The list of additional options for the Wget utility is under the spoiler.

Output of wget -h

GNU Wget 1.11.4, a program for downloading files from the network in offline mode. Usage: wget [OPTION]... ... Required arguments for long options are also required for short options. Run: -V, --version print the Wget version and exit. -h, --help display this help. -b, --background switch to background mode after launch. -e, --execute=COMMAND execute a command in `.wgetrc style".- Logging and input file: -o, --output-file=FILE write messages to FILE. -a, --append-output=FILE append messages to the end of the FILE -d, --debug output a large amount of debugging information -q, --quiet silently (no output) -v, --verbose verbose output (default) -nv, --no-verbose disable verbose mode, but not completely. -i, --input-file=FILE load URLs found in FILE. -F, --force-html assume that the input file is HTML. -B, --base=URL adds a URL to the beginning of relative links in the file -F -i. Loading: -t, --tries=NUMBER set the NUMBER of retries (0 unlimited). --retry-connrefused Retry even if connection is refused. -O, --output-document=FILE write documents to FILE. -nc, --no-clobber skip downloads that would result in downloading existing files. -c, --continue resume downloading a partially downloaded file a. --progress=TYPE select the type of progress bar. -N, --timestamping do not re-download files unless they are newer than local ones. -S, --server-response output the server response. --spider do not load anything. -T, --timeout=SECONDS sets all timeouts to SECONDS. --dns-timeout=SEC Set the DNS lookup timeout in SEC. --connect-timeout=SEC set the connection timeout in SEC. --read-timeout=SEC set the read timeout in SEC. -w, --wait=SECONDS pause in SECONDS between loads. --waitretry=SECONDS pause of 1..SECONDS between repeated download attempts. --random-wait pause of 0...2*WAIT seconds between downloads. --no-proxy explicitly disable the proxy. -Q, --quota=NUMBER set the download quota value to NUMBER. --bind-address=ADDRESS binds to the ADDRESS (hostname or IP) of the local host. --limit-rate=SPEED limit download SPEED. --no-dns-cache disables caching of DNS lookup queries. --restrict-file-names=OS restriction on characters in file names, the use of which is allowed by the OS. --ignore-case Ignore case when matching files and/or directories. -4, --inet4-only connect to IPv4 addresses only. -6, --inet6-only connect to IPv6 addresses only. --prefer-family=FAMILY Connect first to addresses in the specified family, can be IPv6, IPv4 or nothing. --user=USER set both ftp and http user to USER. --password=PASSWORD set both ftp and http passwords to PASSWORD. Directories: -nd, --no-directories do not create directories. -x, --force-directories force directories to be created. -nH, --no-host-directories do not create directories as on the host. --protocol-directories Use protocol name in directories. -P, --directory-prefix=PREFIX save files to PREFIX/... --cut-dirs=NUM ignore the NUMBER of remote directory components. HTTP options: --http-user=USER set the http user to USER. --http-password=PASSWORD set the http password to PASSWORD. --no-cache discard data cached by the server. -E, --html-extension save HTML documents with the extension `.html". --ignore-length ignore the `Content-Length" header field. --header=LINE insert a LINE between headers. --max-redirect maximum allowed number of redirects per page. --proxy-user=USER set USER as the username for the proxy. --proxy-password=PASSWORD set PASSWORD as the password for the proxy. --referer=URL include the `Referer: URL' header in the HTTP request. --save-headers save HTTP headers to a file. -U, --user-agent=AGENT identify itself as AGENT instead of Wget/VERSIY - -no-http-keep-alive disable HTTP keepalive (persistent connections) --no-cookies do not use cookies --load-cookies=FILE load cookies from FILE before the session --save-cookies=FILE save cookies in FILE after session --keep-session-cookies load and save session cookies (non-persistent) --post-data=STRING use POST method; send STRING as data --post-file=FILE use POST method; send LA FILE contents --content-disposition Consider the Content-Disposition header when choosing names for local files (EXPERIMENTAL) --auth-no-challenge Send basic HTTP authentication data without waiting for a response from the server HTTPS options (SSL/TLS): --secure-protocol=Select a secure protocol: auto, SSLv2, SSLv3 or TLSv1 --no-check-certificate Do not check the server certificate. --certificate=FILE user certificate file. --certificate-type=TYPE user certificate type: PEM or DER. --private-key=FILE private key file. --private-key-type=TYPE private key type: PEM or DER. --ca-certificate=FILE file with the CA set. --ca-directory=CAT directory where the list of CAs is stored. --random-file=FILE file with random data for SSL PRNG. --egd-file=FILE file defining an EGD socket with random data. FTP options: --ftp-user=USER set the ftp user to USER. --ftp-password=PASSWORD set the ftp password to PASSWORD. --no-remove-listing do not remove files from `.listing' files. --no-glob disable masks for FTP file names. --no-passive-ftp disable "passive" transfer mode. --retr-symlinks when loading recursively files via links (did not download ogs). --preserve-permissions preserve access rights of remote files. Recursive loading: -r, --recursive enable recursive loading. -l, --level=NUMBER recursion depth (inf and 0 - infinity) . --delete-after delete local files after loading -k, --convert-links make links local in the loaded HTML -K, --backup-converted make a backup copy of X.orig before converting file X -m, - -mirror is a short option equivalent to -N -r -l inf --no-remove-listing -p, --page-requisites load all images, etc. needed to display the HTML page --strict-comments enable strict (SGML) processing of HTML comments Recursion permissions/denies: -A, --accept=LIST list of allowed extensions, separated by fifths. -R, --reject=LIST list of banned extensions, separated by fifths. -D, --domains=LIST list of allowed domains, separated by commas. --exclude-domains=LIST list of prohibited domains, separated by commas. --follow-ftp follow FTP links in HTML documents. --follow-tags=LIST list of HTML tags to use, separated by commas. --ignore-tags=LIST list of ignored HTML tags, separated by commas. -H, --span-hosts enter foreign hosts during recursion. -L, --relative follow only relative links. -I, --include-directories=LIST list of allowed directories. -X, --exclude-directories=LIST list of excluded directories. -np, --no-parent do not go up to the parent directory.

Normal copy and paste (Ctrl+C, Ctrl+V) in the Windows command line does not work. To copy text from the Windows command line, you just need to highlight the desired piece of text and press Enter.

To insert text into the command line, you need to right-click on the command line window, select the “Edit” submenu and then execute the required command. Copying and pasting makes working with the Windows Command Prompt much easier.

Let's look at some examples of using the Wget utility.

Let's say that we need to download some file, let it be the title image for this article. To do this, you just need to specify the URL (link) of the desired file, for example like this.

Sometimes Wget may complain about "self-signed certificates" when using HTTPS links, although the certificates themselves are fine, in which case you need to use the additional option "--no-check-certificate".

Jpg --no-check-certificate

When you normally call the Windows command line, the file will be saved to the user directory C:\Users\Username. If the file needs to be saved to a specific directory, then it must be specified using the additional "-P" switch, let this be the test directory on drive C (C:\test).

Jpg --no-check-certificate

If the file must be saved in a specific directory, then there is an easier way to save the file, without specifying additional options. Go to the desired directory through Windows Explorer, hold down the Shift key and right-click on the directory area, and in the menu that opens, select “Open command window.”

The command line that opens will look like this.

Run the path command to specify the location of wget.exe.

Path C:\Program Files (x86)\GnuWin32\bin

Now everything will be saved in this folder.

Let's imagine that we have a file with a list of links that need to be downloaded. Let this be a test.txt file located in the user directory C:\Users\Username, with the following content.

Jpg https://site/wp-content/uploads/2017/04/wp..jpg https://site/wp-content/uploads/2017/03/MariaDB..jpg

The additional key "-i" will indicate the file location - C:\Users\Username\test.txt and all files from the list will be saved.

Wget -i C:\Users\Username\test.txt --no-check-certificate

To download the entire site, you just need to specify its address with the minimum required set of options, for example.

L, --level=NUMBER recursion depth (inf and 0 - infinity). -k, --convert-links make links local in the loaded HTML. -p, --page-requisites load all images, etc. needed to display the HTML page. -t, --tries=NUMBER set the NUMBER of retries (0 without limit).

As a result, we will get a ready-made website mirror that will work autonomously on a computer, thanks to the conversion of links for local use of the site.

But let’s say that we don’t need the entire site, but just this article. In this case, the download command will look like this.

Wget -r -l 10 -t 5 -k -p -np https://site/install-wordpress/ --no-check-certificate Add an additional command to the previous command..

By combining a set of additional options in Wget commands can achieve different results. So try everything yourself and explore the capabilities of the utility.

We all sometimes download files from the Internet. If you use programs with a graphical interface for this, then everything turns out to be extremely simple. However, when working on the Linux command line, things become somewhat more complicated. Especially for those who are not familiar with the appropriate tools. One such tool is the extremely powerful wget utility, which is suitable for performing all types of downloads. We bring to your attention twelve examples, by analyzing which you can master the basic capabilities of wget.

$ wget https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror=excellmedia
After entering this command, the download of Nagios Core will begin. During this process, you will be able to see data about the download, for example - information about how much data has already been downloaded, the current speed, and how much time is left until the end of the download.

2. Download the file and save it with a new name

If we want to save the downloaded file under a name different from its original name, we will need the wget command with the -O parameter:

$ wget -O nagios_latest https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror=excellmedia
With this approach, the downloaded file will be saved under the name nagios_latest.

3. Limiting file download speed

If necessary, the speed of downloading files using wget can be limited. As a result, this operation will not occupy the entire available data channel and will not affect other processes associated with the network. You can do this by using the --limit-rate option and specifying a rate limit expressed in bytes (as a regular number), kilobytes (with a K after the number), or megabytes (M) per second:

$ wget ––limit-rate=500K https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror =excellmedia
The download speed limit is set here to 500 Kb/s.

4. Completing an interrupted download

If the operation was interrupted while downloading files, you can resume the download by using the -c option of the wget command:

$ wget –c https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror=excellmedia
If this parameter is not used, the download of the incompletely downloaded file will start from the beginning.

If you are downloading a huge file and want to perform this operation in the background, you can do this using the -b option:

$ wget –b https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror=excellmedia

If you have a list of URLs for files to download, but you don't want to manually start downloading those files, you can use the -I option. However, before you start downloading, you need to create a file containing all the addresses. For example, you can do this with the following command:

$vi url.txt
You need to place addresses in this file - one on each line. Next, all that remains is to run wget , passing the newly created file with a list of downloads to this utility:

$ wget –I url.txt
Executing this command will download all files from the list one by one.

7. Increase the total number of file download attempts

To configure the number of retries to download a file, you can use the --tries option:

Wget ––tries=100 https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror=excellmedia

The command to download a file from an anonymous FTP server using wget looks like this:

$ wget FTP-URL
If a username and password are required to access the file, the command will look like this:

$ wget –-ftp-user=dan ––ftp-password=********* FTP-URL

9. Create a local copy of the website

If you need to download the contents of an entire website, you can do this using the --mirror option:

$ wget --mirror -p --convert-links -P /home/dan xyz.com
Note the additional command line options:

  • -p: downloads all files necessary for correct display of HTML pages.
  • --convert-links: Links in documents will be converted for local site browsing purposes.
  • -P /home/dan: materials will be saved to the /home/dan folder.

10. Download only files of a certain type from the site

In order to download only files of a certain type from the site, you can use the -r -A parameters:

$ wget -r -A.txt Website_url

11. Skip files of a certain type

If you want to copy an entire website, but don't need a certain type of file, you can disable downloading using the --reject option:

$ wget --reject=png Website_url

12. Upload using your own .log file

To download a file and use your own .log file, use the -o option and specify the name of the log file:

$ wget -o wgetfile.log https://downloads.sourceforge.net/project/nagios/nagios-4.x/nagios-4.3.1/nagios-4.3.1.tar.gz?r=&ts=1489637334&use_mirror=excellmedia

Results

Wget is a fairly easy to use but very useful Linux utility. And, in fact, what we talked about is only a small part of what she can do. We hope this review will help those new to wget appreciate the program, and perhaps add it to their daily command line tool arsenal.

Dear readers! Do you use Linux command line tools to download files? If yes, please tell us about them.

wget- a text program for downloading files.

If wget capabilities are not enough, then you can use curl .

Examples

Just Download file wget:

Wget ftp://vasya.pupkin.com/film.avi

For continuation of an interrupted download we write:

Wget -c ftp://vasya.pupkin.com/film.avi

Wget --continue ftp://vasya.pupkin.com/film.avi

As in other programs, options have a short and a long form, and you can write -continue instead of -c. Long keys are easier to remember, but take longer to write. You can easily mix different forms of writing.

To download files from the list containing direct links:

Wget -i pupkinlist.txt

Wget --input-file=pupkinlist.txt

Only the file containing the links is indicated here. The file can also be an HTML page containing links. They will be pumped out by the above command.

Using a la “Teleport Pro for Linux”.

When downloading websites, there are more options and therefore more keys are required. Again, you don’t have to remember them all; you can make a script (or better yet, several for different occasions) and call them.

So, if you have a website, and you would like to have a local copy of it on your computer, so that when you disconnect from the network, you can read it without rushing.

Mirroring sites to a local machine:

Wget -m http://www.vasyapupkin.com/

M is equivalent to -r -N -l inf -nr , these options are described below.

Copying a site for local viewing (replacing Internet links with local addresses of downloaded pages):

Wget -r -l0 -k http://www.vasyapupkin.com/

In this case, recursive uploading will be enabled (switch -r, –recursive),

Options

Wget has a large number of useful options - more than Teleport's flags. Being wrapped in a script, for example, teleport and placed in a prominent place (specified in PATH), we have ease of use and a wealth of settings.

Np , –no-parent - do not go above the starting address when loading recursively.

R, –recursive - enable recursive browsing of directories and subdirectories on the remote server.

L , –level= - determine the maximum recursion depth equal to depth when browsing directories on a remote server. By default depth=5.

Np , –no-parent - do not go to the parent directory while searching for files. This is a very useful property because it ensures that only files below a certain hierarchy are copied.

A , –accept , -R , –reject - a comma-separated list of file names that should (accept) or should not (reject) be downloaded. It is allowed to specify file names by mask.

K , –convert-links - convert absolute links in an HTML document into relative links. Only those links that point to actually loaded pages will be converted; the rest will not be converted. Note that only at the end of the work will wget be able to find out which pages were actually loaded. Therefore, only at the end of wget will the final conversion be performed.

–http-user= , –http-passwd= Specify the username and password on the HTTP server.

H, –span-hosts - allows visiting any servers to which there is a link.

P , –page-requisites - load all files that are needed to display HTML pages. For example: pictures, sound, cascading styles (CSS). By default, such files are not downloaded. The -r and -l options specified together can help, but... wget does not distinguish between external and internal documents, there is no guarantee that everything required will be downloaded.

When making client TCP/IP connections, bind to ADDRESS on the local machine. ADDRESS may be specified as a hostname or IP address. This option can be useful if your machine is bound to multiple IPs.

When creating TCP/IP client connections, bind to ADDRESS on the local machine. ADDRESS can be specified as a hostname or an IP address. This option can be useful if your computer is associated with multiple IP addresses.

‘—bind-dns-address=ADDRESS’

This address overrides the route for DNS requests. If you ever need to circumvent the standard settings from /etc/resolv.conf, this option together with ‘—dns-servers’ is your friend. ADDRESS must be specified either as IPv4 or IPv6 address. Wget needs to be built with libcares for this option to be available.

[libcares only] This address overrides the route for DNS queries. If you ever need to bypass the default settings from /etc/resolv.conf, this option along with '--dns-servers' is your friend. ADDRESS must be specified as either an IPv4 or IPv6 address. Wget must be built with libcares for this option to be available.

‘—dns-servers=ADDRESSES’

The given address(es) override the standard nameserver addresses, e.g. as configured in /etc/resolv.conf. ADDRESSES may be specified either as IPv4 or IPv6 addresses, comma-separated. Wget needs to be built with libcares for this option to be available.

[libcares only] The addresses given override the standard nameserver addresses, e.g. as stated in /etc/resolv.conf. ADDRESSES can be specified as either IPv4 or IPv6 addresses, separated by commas. Wget must be built with libcares for this option to be available.

'-t number'
‘—tries=number’

Set number of tries to number. Specify 0 or ' inf’ for infinite retrying. The default is to retry 20 times, with the exception of fatal errors like “connection refused” or “not found” (404), which are not retried.

Set the number of attempts for the number. Specify 0 (zero) or ‘ inf‘ for infinite retry. The default is to retry 20 times, except for fatal errors such as "connection refused" or "not found" (404), which are not retried.

'-O file'
‘—output-document=file’

The documents will not be written to the appropriate files, but all will be concatenated together and written to file. If ‘-’ is used as file, documents will be printed to standard output, disabling link conversion. (Use ‘./-’ to print to a file literally named ‘-’.)

The documents will not be written to their respective files, but they will all be merged together and written to the file. If '-' is used as a file, documents will be printed to standard output, disabling link translation. (Use './-' to print to a file literally named '-'.)

Use of ‘-O’ is not intended to mean simply “use the name file instead of the one in the URL;” rather, it is analogous to shell redirection: ‘ wget -O file http://foo’is intended to work like’ wget -O - http://foo > file'; file will be truncated immediately, and all downloaded content will be written there.

Using "-O" does not simply mean "use a names file instead of a URL", but rather is similar to shell redirection: " wget -O file http://foo» designed to work for example ‘ wget -O - http://foo > file‘; the file will be truncated immediately, and all downloaded content will be written there.

For this reason, ‘-N’ (for timestamp-checking) is not supported in combination with ‘-O’: since file is always newly created, it will always have a very new timestamp. A warning will be issued if this combination is used.

For this reason, "-N" (for timestamp checking) is not supported in combination with "-O": since the file is always created, it will always have a very recent timestamp. A warning will be issued if you use this combination.

Similarly, using ' -r’ or ‘ -p'with' -O’ may not work as you expect: Wget won’t just download the first file to file and then download the rest to their normal names: all downloaded content will be placed in file. This was disabled in version 1.11, but has been reinstated (with a warning) in 1.11.2, as there are some cases where this behavior can actually have some use.

Likewise, using ‘ -r' or ' -p' With ' -O' may not work as you expect: Wget won't just download the first file into a file and then download the rest in their usual names: all downloaded stuff will be put into a file. This was disabled in version 1.11, but was restored (with a warning) in 1.11.2, as there are cases where this behavior can actually be exploited.

A combination with ‘ -nc’ is only accepted if the given output file does not exist.

Combination with ‘ -nc' is only accepted if the given output file does not exist.

Note that a combination with ‘ -k’ is only permitted when downloading a single document, as in that case it will just convert all relative URIs to external ones; ‘ -k’ makes no sense for multiple URIs when they’re all being downloaded to a single file; ‘ -k’ can be used only when the output is a regular file.

Please note that the combination with " -k» is only allowed when loading a single document, since in this case it will simply convert all relative URIs to external ones; " -k" does not make sense for multiple URIs when they are all loaded into one file; ‘ -k' can only be used when the output is a regular file.

'-nc'
‘—no-clobber’

If a file is downloaded more than once in the same directory, Wget's behavior depends on a few options, including ‘ -nc’. In certain cases, the local file will be clobbered, or overwritten, upon repeated download. In other cases it will be preserved.

If a file is downloaded more than once in the same directory, the behavior Wget depends on several parameters, including "-nc". In some cases, the local file will be reset or overwritten when you download it again. In other cases it will be saved.

When running Wget without '-N', '-nc', '-r', or '-p', downloading the same file in the same directory will result in the original copy of file being preserved and the second copy being named ' file.1'. If that file is downloaded yet again, the third copy will be named ‘file.2’, and so on. (This is also the behavior with ‘-nd’, even if ‘-r’ or ‘-p’ are in effect.)

When running Wget without '-N', '-nc', '-r' or '-p', downloading the same file in the same directory will result in the original copy of the file being saved and the second copy being named 'file. 1. If this file is downloaded again, the third copy will be called "file.2" and so on. (This is also the behavior with "-nd" even if "-r" or "-p" are in effect.)

When' -nc’ is specified, this behavior is suppressed, and Wget will refuse to download newer copies of ‘ file’. Therefore, “no-clobber” is actually a misnomer in this mode—it’s not clobbering that’s prevented (as the numeric suffixes were already preventing clobbering), but rather the multiple version saving that’s prevented.

When indicated "-nc", this behavior is suppressed and Wget will refuse to download new copies of ‘ file’. So "no-clobber" is actually a misnomer in this mode - it's not clobbering that is prevented (since numeric suffixes already prevent anti-aliasing), but rather it prevents multiple versions from being saved.

When running Wget with ' -r’ or ‘ -p', but without ' -N’, ‘-nd', or ' -nc’, re-downloading a file will result in the new copy simply overwriting the old. Adding '-nc' will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.

When running Wget with ' -r' or ' -p', but without ' -N’, ‘-nd' or ' -nc', downloading the file again will cause the new copy to simply overwrite the old one. Adding "-nc" will prevent this behavior, instead the original version will be retained and any new copies on the server will be ignored.

When running Wget with '-N', with or without '-r' or '-p', the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file ( see Time-Stamping). ‘-nc’ may not be specified at the same time as ‘-N’.

When running Wget with '-N', with or without '-r' or '-p', the decision whether or not to download a newer copy of a file depends on the local and remote timestamp and file size (see Time stamping ). "-nc" may not be specified at the same time as "-N".

A combination with ‘-O’/‘—output-document’ is only accepted if the given output file does not exist.

The combination with '-O' / '- output-document' is only accepted if the given output file does not exist.

Note that when' -nc’ is specified, files with the suffixes ‘.html’ or ‘.htm’ will be loaded from the local disk and parsed as if they had been retrieved from the Web.

Please note that when indicated "-nc", files with the suffixes ".html" or ".htm" will be downloaded from the local disk and parsed as if they were retrieved from the Internet.

‘—backups=backups’

Before (over)writing a file, back up an existing file by adding a ‘.1’ suffix (‘_1’ on VMS) to the file name. Such backup files are rotated to ‘.2’, ‘.3’, and so on, up to backups (and lost beyond that).

Before (over)writing a file, create a backup copy of the existing file by adding the suffix '.1' ('_1' on VMS) to the file name. Such backup files are rotated to ".2", ".3", etc., all the way to backups (and are lost for doing so).

'-c'
‘—continue’

Continue getting a partially-downloaded file. This is useful when you want to finish up a download started by a previous instance of Wget, or by another program. For instance:

Continue to receive the partially downloaded file. This is useful when you want to complete a download started by a previous instance of Wget, or another program. For example:

wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z

If there is a file named ls-lR.Z in the current directory, Wget will assume that it is the first portion of the remote file, and will ask the server to continue the retrieval from an offset equal to the length of the local file .

If there is a file named ls-lR.Z in the current directory, Wget will assume that this is the first part of the remote file and ask the server to continue retrieving at an offset equal to the length of the local file.

Note that you don’t need to specify this option if you just want the current invocation of Wget to retry downloading a file should the connection be lost midway through. This is the default behavior. ‘ -c’ only affects resumption of downloads started prior to this invocation of Wget, and whose local files are still sitting around.

Note that you don't need to specify this option if you simply want the current Wget call to retry downloading the file if the connection is lost halfway through. This is the default behavior. ‘ -c’ only affects resuming downloads that started before this Wget call and whose local files are still sitting.

Without ‘-c’, the previous example would just download the remote file to ls-lR.Z.1, leaving the truncated ls-lR.Z file alone.

Without the -c, the previous example simply loaded the remote file into ls-lR.Z.1, leaving the file with the truncated ls-lR.Z.

If you use ' -c’ on a non-empty file, and the server does not support continued downloading, Wget will restart the download from scratch and overwrite the existing file entirely.

If you use ' -c’ in a non-empty file and the server does not support persistent downloading, Wget will restart the download from scratch and completely overwrite the existing file.

Beginning with Wget 1.7, if you use ‘ -c’ on a file which is of equal size as the one on the server, Wget will refuse to download the file and print an explanatory message. The same happens when the file is smaller on the server than locally (presumably because it was changed on the server since your last download attempt) - because “continuing” is not meaningful, no download occurs.

Since Wget 1.7, if you use " -c" in a file with the same size as the one on the server, Wget will refuse to download the file and print an explanatory message. The same thing happens when the file is smaller on the server than locally (presumably because it has been changed on the server since the last time the download was attempted), because "continue" makes no sense, the download does not occur.

On the other side of the coin, while using '-c', any file that's bigger on the server than locally will be considered an incomplete download and only (length(remote) — length(local)) bytes will be downloaded and tacked onto the end of the local file. This behavior can be desirable in certain cases—for instance, you can use ‘wget -c’ to download just the new portion that’s been appended to a data collection or log file.

On the other side of the coin, when using '-c', any file that is larger on the server than locally will be considered an incomplete download and only (length(remote) length(local)) bytes will be downloaded and attached to the end of the local file. This behavior may be desirable in some cases, for example you might use "wget ​​-c" to download only the new part that has been added to the data collection or log file.

However, if the file is bigger on the server because it’s been changed, as opposed to just appended to, you’ll end up with a garbled file. Wget has no way of verifying that the local file is really a valid prefix of the remote file. You need to be especially careful of this when using ‘-c’ in conjunction with ‘-r’, since every file will be considered as an “incomplete download” candidate.

However, if the file is larger on the server because it was modified rather than just added, you will end up with a malformed file. Wget has no way to verify that a local file is indeed a valid prefix of a remote file. You should be especially careful when using '-c' in combination with '-r', since every file will be considered a candidate for "incomplete loading".

Another instance where you’ll get a garbled file if you try to use ‘-c’ is if you have a lame HTTP proxy that inserts a “transfer interrupted” string into the local file. In the future a “rollback” option may be added to deal with this case.

Another example where you will get a malformed file if you try to use "-c" is if you have a lame HTTP proxy that inserts the "interrupt forwarding" line into the local file. In the future, a "rollback" option may be added to resolve this matter.

Note that' -c’ only works with FTP servers and with HTTP servers that support the Range header.

Note that ' -c’ only works with FTP servers and HTTP servers that support the Range header.

‘—start-pos=OFFSET’

Start downloading at zero-based position OFFSET. Offset may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix, etc.

Start loading at the zero OFFSET position. The offset can be expressed in bytes, kilobytes with the suffix "k" or megabytes with the suffix "m", etc.

‘—start-pos’ has higher precedence over ‘—continue’. When ‘—start-pos’ and ‘—continue’ are both specified, wget will emit a warning then proceed as if ‘—continue’ was absent.

Server support for continued download is required, otherwise ‘—start-pos’ cannot help. See ‘-c’ for details.

‘—progress=type’

Select the type of the progress indicator you wish to use. Legal indicators are “dot” and “bar”.

The “bar” indicator is used by default. It draws an ASCII progress bar graphics (a.k.a “thermometer” display) indicating the status of retrieval. If the output is not a TTY, the “dot” bar will be used by default.

Use ‘—progress=dot’ to switch to the “dot” display. It traces the retrieval by printing dots on the screen, each dot representing a fixed amount of downloaded data.

The progress type can also take one or more parameters. The parameters vary based on the type selected. Parameters to type are passed by appending them to the type sperated by a colon (:) like this: ‘—progress=type:parameter1:parameter2’.

When using the dotted retrieval, you may set the style by specifying the type as ‘dot:style’. Different styles assign different meaning to one dot. With the default style each dot represents 1K, there are ten dots in a cluster and 50 dots in a line. The binary style has a more “computer”-like orientation-8K dots, 16-dots clusters and 48 dots per line (which makes for 384K lines). The mega style is suitable for downloading large files-each dot represents 64K retrieved, there are eight dots in a cluster, and 48 dots on each line (so each line contains 3M). If mega is not enough then you can use the giga style-each dot represents 1M retrieved, there are eight dots in a cluster, and 32 dots on each line (so each line contains 32M).

With ‘—progress=bar’, there are currently two possible parameters, force and noscroll.

When the output is not a TTY, the progress bar always falls back to “dot”, even if ‘—progress=bar’ was passed to Wget during invokation. This behavior can be overridden and the “bar” output forced by using the “force” parameter as ‘—progress=bar:force’.

By default, the ‘bar’ style progress bar scroll the name of the file from left to right for the file being downloaded if the filename exceeds the maximum length allotted for its display. In certain cases, such as with ‘—progress=bar:force’, one may not want the scrolling filename in the progress bar. By passing the “noscroll” parameter, Wget can be forced to display as much of the filename as possible without scrolling through it.

Note that you can set the default style using the progress command in .wgetrc. That setting may be overridden from the command line. For example, to force the bar output without scrolling, use ‘—progress=bar:force:noscroll’.

‘—show-progress’

Force wget to display the progress bar in any verbosity.

By default, wget only displays the progress bar in verbose mode. One may however, want wget to display the progress bar on screen in conjunction with any other verbosity modes like ‘—no-verbose’ or ‘—quiet’. This is often a desired property when invoking wget to download several small/large files. In such a case, wget could simply be invoked with this parameter to get a much cleaner output on the screen.

This option will also force the progress bar to be printed to stderr when used alongside the ‘—logfile’ option.

'-N'
‘—timestamping’

Turn on time-stamping. See Time-Stamping, for details.

Turn on timp. See "Time-embossing" for more details.

‘—no-if-modified-since’

Do not send If-Modified-Since header in ‘-N’ mode. Send preliminary HEAD request instead. This has only effect in ‘-N’ mode.

‘—no-use-server-timestamps’

Don’t set the local file’s timestamp by the one on the server.

By default, when a file is downloaded, its timestamps are set to match those from the remote file. This allows the use of ‘—timestamping’ on subsequent invocations of wget. However, it is sometimes useful to base the local file’s timestamp on when it was actually downloaded; for that purpose, the ‘—no-use-server-timestamps’ option has been provided.

'-S'
‘—server-response’

Print the headers sent by HTTP servers and responses sent by FTP servers.

'—spider'

When invoked with this option, Wget will behave as a Web spider, which means that it will not download the pages, just check that they are there. For example, you can use Wget to check your bookmarks:

wget --spider --force-html -i bookmarks.html
This feature needs much more work for Wget to get close to the functionality of real web spiders.

'-T seconds'
‘—timeout=seconds’

Set the network timeout to seconds seconds. This is equivalent to specifying ‘—dns-timeout’, ‘—connect-timeout’, and ‘—read-timeout’, all at the same time.

When interacting with the network, Wget can check for timeout and abort the operation if it takes too long. This prevents anomalies like hanging reads and infinite connects. The only timeout enabled by default is a 900-second read timeout. Setting a timeout to 0 disables it altogether. Unless you know what you are doing, it is best not to change the default timeout settings.

All timeout-related options accept decimal values, as well as subsecond values. For example, ‘0.1’ seconds is a legal (though unwise) choice of timeout. Subsecond timeouts are useful for checking server response times or for testing network latency.

‘—dns-timeout=seconds’

Set the DNS lookup timeout to seconds seconds. DNS lookups that don’t complete within the specified time will fail. By default, there is no timeout on DNS lookups, other than that implemented by system libraries.

‘—connect-timeout=seconds’

Set the connect timeout to seconds seconds. TCP connections that take longer to establish will be aborted. By default, there is no connect timeout, other than that implemented by system libraries.

‘—read-timeout=seconds’

Set the read (and write) timeout to seconds seconds. The “time” of this timeout refers to idle time: if, at any point in the download, no data is received for more than the specified number of seconds, reading fails and the download is restarted. This option does not directly affect the duration of the entire download.

Of course, the remote server may choose to terminate the connection sooner than this option requires. The default read timeout is 900 seconds.

‘—limit-rate=amount’

Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the ‘k’ suffix, or megabytes with the ‘m’ suffix. For example, ‘—limit-rate=20k’ will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don’t want Wget to consume the entire available bandwidth.

This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, ‘—limit-rate=2.5k’ is a legal value.

Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. This strategy causes the TCP transfer to slow Eventually down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don’t be surprised if limiting the rate doesn’t work well with very small files.

'-w seconds'
‘—wait=seconds’

Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the m suffix, in hours using h suffix, or in days using d suffix.

Specifying a large value for this option is useful if the network or the destination host is down, so that Wget can wait long enough to reasonably expect the network error to be fixed before the retry. The waiting interval specified by this function is influenced by --random-wait, which see.

‘—waitretry=seconds’

If you don’t want Wget to wait between every retrieval, but only between retries of failed downloads, you can use this option. Wget will use linear backoff, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify.

By default, Wget will assume a value of 10 seconds.

‘—random-wait’

Some web sites may perform log analysis to identify retrieval programs such as Wget by looking for statistically significant similarities in the time between requests. This option causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where wait was specified using the ‘—wait’ option, in order to mask Wget’s presence from such analysis.

A 2001 article in a publication devoted to development on a popular consumer platform provided code to perform this analysis on the fly. Its author suggested blocking at the class C address level to ensure automated retrieval programs were blocked despite changing DHCP-supplied addresses.

The ‘—random-wait’ option was inspired by this ill-advised recommendation to block many unrelated users from a website due to the actions of one.

‘—no-proxy’

Don’t use proxies, even if the appropriate *_proxy environment variable is defined.

See Proxies, for more information about the use of proxies with Wget.

'-Q quota'
‘—quota=quota’

Specify download quota for automatic retrievals. The value can be specified in bytes (default), kilobytes (with ‘k’ suffix), or megabytes (with ‘m’ suffix).

Note that quota will never affect downloading a single file. So if you specify ‘wget -Q10k https://example.com/ls-lR.gz’, all of the ls-lR.gz will be downloaded. The same goes even when several URLs are specified on the command-line. However, quota is respected when retrieving either recursively, or from an input file. you may safely type ‘wget -Q2m -i sites’-download will be aborted when the quota is exceeded.

Setting quota to 0 or to ‘inf’ unlimited the download quota.

‘—no-dns-cache’

Turn off caching of DNS lookups. Normally, Wget remembers the IP addresses it looked up from DNS so it doesn’t have to repeatedly contact the DNS server for the same (typically small) set of hosts it retrieves from. This cache exists in memory only; a new Wget run will contact DNS again.

However, it has been reported that in some situations it is not desirable to cache host names, even for the duration of a short-running application like Wget. With this option Wget issues a new DNS lookup (more precisely, a new call to gethostbyname or getaddrinfo) each time it makes a new connection. Please note that this option will not affect caching that might be performed by the resolving library or by an external caching layer, such as NSCD.

If you don’t understand exactly what this option does, you probably won’t need it.

‘—restrict-file-names=modes’

Change which characters found in remote URLs must be escaped during generation of local filenames. Characters that are restricted by this option are escaped, i.e. replaced with '%HH', where 'HH' is the hexadecimal number that corresponds to the restricted character. This option may also be used to force all alphabetical cases to be either lower- or uppercase.

By default, Wget escapes the characters that are not valid or safe as part of file names on your operating system, as well as control characters that are typically unprintable. This option is useful for changing these defaults, perhaps because you are downloading to a non-native partition, or because you want to disable escaping of the control characters, or you want to further restrict characters to only those in the ASCII range of values.

The modes are a comma-separated set of text values. The acceptable values ​​are 'unix', 'windows', 'nocontrol', 'ascii', 'lowercase', and 'uppercase'. The values ​​'unix' and 'windows' are mutually exclusive (one will override the other), as are 'lowercase' and 'uppercase'. Those last are special cases, as they do not change the set of characters that would be escaped, but rather force local file paths to be converted either to lower- or uppercase.

When “unix” is specified, Wget escapes the character ‘/’ and the control characters in the ranges 0–31 and 128–159. This is the default on Unix-like operating systems.

When “windows” is given, Wget escapes the characters ‘\’, ‘|’, ‘/’, ‘:’, ‘?’, ‘”’, ‘*’, ‘<’, ‘>’, and the control characters in the ranges 0–31 and 128–159. In addition to this, Wget in Windows mode uses '+' instead of ':' to separate host and port in local file names, and uses '@' instead of '?' to separate the query portion of the file name from the rest . Therefore, a URL that would be saved as 'www.xemacs.org:4300/search.pl?input=blah' in Unix mode would be saved as 'www.xemacs.org+4300/search.pl@input=blah' in Windows mode. This mode is the default on Windows.

If you specify ‘nocontrol’, then the escaping of the control characters is also switched off. This option may make sense when you are downloading URLs whose names contain UTF-8 characters, on a system which can save and display filenames in UTF-8 (some possible byte values ​​used in UTF-8 byte sequences fall in the range of values ​​designated by Wget as “controls”).

The ‘ascii’ mode is used to specify that any bytes whose values ​​are outside the range of ASCII characters (that is, greater than 127) shall be escaped. This can be useful when saving filenames whose encoding does not match the one used locally.

‘-4’
‘—inet4-only’
‘-6’
‘—inet6-only’

Force connecting to IPv4 or IPv6 addresses. With ‘—inet4-only’ or ‘-4’, Wget will only connect to IPv4 hosts, ignoring AAAA records in DNS, and refusing to connect to IPv6 addresses specified in URLs. Conversely, with ‘—inet6-only’ or ‘-6’, Wget will only connect to IPv6 hosts and ignore A records and IPv4 addresses.

Neither options should be needed normally. By default, an IPv6-aware Wget will use the address family specified by the host’s DNS record. If the DNS responds with both IPv4 and IPv6 addresses, Wget will try them in sequence until it finds one it can connect to. (Also see –prefer-family option described below.)

These options can be used to deliberately force the use of IPv4 or IPv6 address families on dual family systems, usually to aid debugging or to deal with broken network configuration. Only one of ‘—inet6-only’ and ‘—inet4-only’ may be specified at the same time. Neither option is available in Wget compiled without IPv6 support.

‘—prefer-family=none/IPv4/IPv6’

When given a choice of several addresses, connect to the addresses with specified address family first. The address order returned by DNS is used without change by default.

When selecting multiple addresses, connect to addresses with the specified address family first. The address order returned by DNS is used unchanged by default.

This avoids spurious errors and connect attempts when accessing hosts that resolve to both IPv6 and IPv4 addresses from IPv4 networks. For example, ‘www.kame.net’ resolves to ‘2001:200:0:8002:203:47ff:fea5:3085’ and to ‘203.178.141.194’. When the preferred family is IPv4, the IPv4 address is used first; when the preferred family is IPv6, the IPv6 address is used first; if the specified value is none, the address order returned by DNS is used without change.

This avoids false errors and connection attempts when accessing hosts that resolve IPv6 and IPv4 addresses from IPv4 networks. For example, "www.kame.net" resolves "2001:200:0:8002:203:47ff:fea5:3085" and "203.178.141.194". When the preferred family is IPv4, the IPv4 address is used first; when the preferred family is IPv6, the IPv6 address is used first; if the specified value is none, the address returned by DNS is used unchanged.

Unlike ‘-4’ and ‘-6’, this option doesn’t inhibit access to any address family, it only changes the order in which the addresses are accessed. Also note that the reordering performed by this option is stable-it doesn’t affect order of addresses of the same family. That is, the relative order of all IPv4 addresses and of all IPv6 addresses remains intact in all cases.

Unlike "-4" and "-6", this option does not deny access to any family of addresses, it changes the order in which addresses are accessed. Also note that the reordering performed by this option is stable—it does not affect the ordering of addresses in the same family. That is, the relative order of all IPv4 addresses and all IPv6 addresses remains intact in all cases.

‘—retry-connrefused’

Consider “connection refused” a transient error and try again. Normally Wget gives up on a URL when it is unable to connect to the site because failure to connect is taken as a sign that the server is not running at all and that retries would not help. This option is for mirroring unreliable sites whose servers tend to disappear for short periods of time.

Please note the "connection was refused" transient error and try again. Typically Wget refuses a URL when it can't connect to a site, because connection failure is taken as a sign that the server isn't running at all and that trying won't help. This option is intended for mirroring untrusted sites whose servers tend to disappear for short periods of time.

‘—user=user’
‘—password=password’

Specify the username user and password password for both FTP and HTTP file retrieval. These parameters can be overridden using the ‘ --ftp-user’ and ‘ --ftp-password’ options for FTP connections and the ‘—http-user’ and ‘—http-password’ options for HTTP connections.

Provide user password and user password to search FTP and HTTP files. These parameters can be overridden using the options " --ftp-user" And " --ftp-password" for FTP connections and "-http-user" and "-http-password" for HTTP connections.

‘—ask-password’

Prompt for a password for each connection established. Cannot be specified when ‘ --password’ is being used, because they are mutually exclusive.

Request a password for each established connection. You cannot specify when to use " --password", because they are mutually exclusive.

‘—no-iri’

Turn off internationalized URI (IRI) support. Use ‘—iri’ to turn it on. IRI support is activated by default.

Disable internationalized URI (IRI) support. Use "-iri" to turn it on. IRI support is enabled by default.

You can set the default state of IRI support using the iri command in .wgetrc. That setting may be overridden from the command line.

You can set the default IRI state using the iri command in .wgetrc. This option can be overridden from the command line.

‘—local-encoding=encoding’

Force Wget to use encoding as the default system encoding. That affects how Wget converts URLs specified as arguments from locale to UTF-8 for IRI support.

Force Wget to use encoding as the system default encoding. This affects how Wget converts URLs supplied as arguments from the locale to UTF-8 for IRI support.

Wget use the function nl_langinfo() and then the CHARSET environment variable to get the locale. If it fails, ASCII is used.

Wget uses the nl_langinfo() function and then the CHARSET environment variable to get the locale. If this fails, ASCII is used.

You can set the default local encoding using the local_encoding command in .wgetrc. That setting may be overridden from the command line.

You can set the default local encoding using the local_encoding command in .wgetrc. This option can be overridden from the command line.

‘—remote-encoding=encoding’

Force Wget to use encoding as the default remote server encoding. That affects how Wget converts URIs found in files from remote encoding to UTF-8 during a recursive fetch. This options is only useful for IRI support, for the interpretation of non-ASCII characters.

Force Wget to use the encoding as the remote server's standard encoding. This affects how Wget converts URIs found in files from remote encoding to UTF-8 during recursive fetching. These options are only useful for IRI support, for interpreting non-ASCII characters.

For HTTP, remote encoding can be found in HTTP Content-Type header and in HTML Content-Type http-equiv meta tag.

For HTTP, remote encoding can be found in the HTTP Content-Type header and in the http-equiv HTML Content-Type meta tag.

You can set the default encoding using the remoteencoding command in .wgetrc. That setting may be overridden from the command line.

You can set the default encoding using the remoteencoding command in .wgetrc. This option can be overridden from the command line.

‘—unlink’

Force Wget to unlink file instead of clobbering existing file. This option is useful for downloading to the directory with hardlinks.

Force Wget disable the file rather than clog the existing file. This option is useful for uploading to a directory with hard links.

GNU Wget is a small useful and openly distributed utility for downloading files from the Internet. It supports HTTP, HTTPS, and FTP protocols, downloading from proxy servers via HTTP. Among the program's features it is worth noting:

  • Site Crawl: Wget can follow links on HTML pages and create local copies of remote web sites, while completely restoring the site's folder structure ("recursive downloading") is possible. During this operation, Wget looks for a file with permissions for robots (/robots.txt). It is also possible to convert links in downloaded HTML files for further viewing of the site in offline mode (“off-line browsing”).
  • Checking file headers: Wget can read file headers (available via HTTP and FTP) and compare them with the headers of previously downloaded files, after which it can download new versions of the files. This allows you to use Wget to mirror sites or a set of files on FTP.
  • Continue downloading: If there is a problem during downloading, Wget will try to continue downloading the file. If the server from which the file is downloaded supports file resuming, then Wget will continue to download the file from exactly the point where the download was interrupted.

Configuration files:

/usr/local/etc/wgetrc - Default settings file location.
.wgetrc - User settings file for a specific user (located in no other way than in the folder of this user).

Syntax:

wget [ options] [URL]

Options:

  • -V (--version) - Display Wget version.
  • -h (--help) - Display Wget command line options.
  • -b (--background) - Go to background after launch. If the message file is not specified with the -o option, it is written to wget-log
  • -e command(--execute command) - Execute command as if she were part .wgetrc. The command will be executed after the commands in .wgetrc.

Message options:

  • -o logfile(--output-file= logfile) - Log all messages to logfile. Otherwise they will be sent to stderr.
  • -a logfile(--append-output= logfile) - Add logfile. Like -o, only logfile is not replaced, but supplemented. If logfile does not exist, a new file is created.
  • -d (--debug) - Display debug messages - miscellaneous information important to Wget developers.
  • -q (--quiet) - Turn off Wget messages.
  • -v (--verbose) - Enable verbose messages, with all available data. Enabled by default.
  • -nv (--non-verbose) - Use shortened messages (to turn off messages, see -q). Error messages and basic information will be displayed.
  • -i file(--input-file= file) - Count URL from file. In this case, you do not need to specify the URL on the command line. If the URLs are specified on both the command line and file, then the URLs from the command line will be loaded first. file doesn't have to be in HTML format (but it's okay if it is) -- the URLs just need to be included in it. (If you specify --force-html, the file will be read as html. In this case, problems with relative links may occur. This can be prevented by adding " " or by entering --base= on the command line url.)
  • -F (--force-html) - When reading a URL from a file, forces the file to be read as HTML. To prevent errors in the case of a local HTML file, add " " or enter the --base command line option.
  • -B URL (--base= URL) - When reading a URL from a file (-F) determines URL, appended to the relative addresses of the file specified by the -i option.

Loading

  • --bind-address= ADDRESS - For TCP/IP connections, sends "bind()" to ADDRESS on the local machine. IN ADDRESS Both host name and IP address can be specified. Used if your computer has multiple IP addresses.
  • -t number(--tries= number) - Sets the number of repetitions number. Specify 0 or inf to disable repeats.
  • -O file(--output-document= file) - The documents will not be written to their respective files, but rather will be merged together and written to a file file. If file exists, it will be replaced. If file specified as -, then the documents will be output to standard output (stdout). This parameter automatically sets the number of retries to 1. Useful when downloading split files from mail servers via the web interface.
  • -nc (--no-clobber) - If the connection is interrupted while loading the site, then specify this parameter to continue loading from the point where the connection was interrupted.

When running Wget without the -N, -nc, or -r options, downloading the same file into the same folder will create a copy of the file named file.1. If a file exists with the same name, the third copy will be called file.2 etc. The -nc option will display warnings about this.

When running Wget with the -r option, but without -N or -nc, a new download of the site will replace the files already downloaded. When you specify the -nc option, downloading will continue where it left off and downloaded files will not be downloaded again (unless they have changed). When running Wget with the -N option, with or without -r, a file will only be downloaded if it is newer than an existing one, or if its size does not match the existing copy (see Comparison by date). -nc is not combined with -N. When the -nc parameter is specified, files with extensions .html or (this is just terrible) .htm from local drives will be downloaded as if from the Internet.

  • -c (--continue) - Resume file download. Used if the file download was interrupted. For example:

If the current folder already contains a file named ls-lR.Z, then Wget will check whether this file matches the one being downloaded (not in size!), and if so, it will send a request to the server to continue downloading the file from the same place where the download stopped last time. Remember that if the connection is lost, Wget will repeat download attempts on its own and without the -c parameter, and only when it “gives up” and completes its work will this parameter be needed to resume downloading the file.

Without specifying the -c option, the previous example will cause the specified file to be downloaded again with the final name ls-lR.Z.1, without touching the existing one in any way ls-lR.Z.

Starting from version 1.7, when specifying the -c parameter, if the file on the server has an equal or smaller size than the local file, then Wget will not download anything and will display a corresponding message.

When using -c, any file on the server that is larger than the local file will be considered underdownloaded. In this case, only the “missing” bytes of the larger file will be loaded and written to the end of the file. This can be useful if you need to download new messages from a log.

Moreover, if the downloaded file is larger because it changed, then you will end up with a corrupted file (that is, the file may end up completely different from the original). You need to be especially careful when using -c with -r, since every changed file may be a candidate for a ``download''.

You will also get a corrupted file if your HTTP proxy server is acting stupid and writing a "transfer interrupted" message to the file when the connection is lost. Wget will probably fix this itself in future versions.

Remember that -c only works with FTP and HTTP servers that support "Range" headers (i.e. resuming files).

  • --progress= type - Download progress indicator and its type. Possible values: ``dot"" and ``bar""

The default is ``bar"". Specifying the --progress=bar option will cause a nice indicator to be drawn using ASCII characters (like "thermometer"). If the standard output is not TTY, ``dot"" will be used.

Specify --progress=dot to switch to the ``dot"" type. Loading progress will be indicated by adding a dot or an equal sign in the bar, each symbol representing the same amount of data.

When using this type, you can specify its dot style: style. If the style is "default", then each character will represent 1 KB, 10 characters per cluster and 50 per line. The "binary" style has a more "computer" look - 8Kb per character, 16 characters per cluster and 48 characters per line (a 384 Kb line is obtained). The "mega" style is used to load large files - each character represents 64Kb, 8 characters per cluster and 48 characters per line (that's 3MB per line).

You can define the default style using the "progress" command in .wgetrc. If you want the ``bar"" indicator type to be used always (and not just when output to stdout), then specify --progress=bar:force.

  • -N (--timestamping) - Enable comparison by date.
  • -S (--server-response) - Display headers sent to HTTP servers and requests sent to FTP servers.
  • --spider - Set Wget to behave like a "spider", i.e. Wget will not download files, but will only check for their presence. This way you can check your bookmarks and website links. For example:

wget --spider --force-html -i bookmarks.html

Wget does not contain all the features of "real spiders" for the WWW.

  • -T seconds (--timeout= seconds) - Wait time in seconds. The default timeout is 900 s (15 min). Setting the value to 0 disables the timeout check. Please do not lower the timeout value unless you know exactly what you are doing.
  • -w seconds (--wait= seconds) - Pause in seconds between multiple downloads (including repetitions). This reduces server load. To specify a value in minutes, use "m", for hours - "h", for days - "d" after the number. Specifying a large value for this parameter is useful if the network is unstable (for example, if the modem connection is interrupted).
  • --waitretry= seconds - Sets a pause only between repeats of interrupted downloads. Wget will wait 1 second after the first failure, 2 seconds after the second failure to download the same file, etc. - up to the maximum, which is indicated in seconds. For example, with this parameter set to 10, Wget will wait a total of (1 + 2 + ... + 10) = 55 seconds for each file. This value is specified by default in the file wgetrc.
  • --random-wait - Some servers, by generating log files with paused file requests, can detect recursive file downloads - scanning by robots such as Wget. This parameter sets the time between requests, varying the pauses with time calculated from 0 to 2* wait(seconds), where wait specified with the -w option to mask Wget. We must not forget that the Wget source code is available, and therefore even this disguise can be calculated if desired.
  • -Y on/off (--proxy=on/off) - Proxy server support. Enabled by default if a proxy is defined.
  • -Q quota(--quota= quota) - Quota for the size of uploaded files. Specified in bytes (by default), in kilobytes KB (if at the end of k) or in megabytes MB (if at the end of m).

When the quota is exhausted, the current file is downloaded to the end, that is, the quota does not work when downloading one file..gz, then the file ls-lR.gz will be fully loaded. Also, all files specified on the command line will be downloaded, as opposed to a list of files in one file or as with a recursive download.

Specifying 0 or inf will cancel the quota.

Folder Upload Options

  • -nd (--no-directories) - Do not create a folder structure when loading recursively. With this option, all files will be downloaded to one folder. If a file with the given name already exists, it will be saved under the name FileName.n.
  • -x (--force-directories) - The opposite of the -nd parameter - create a folder structure starting from the main page of the server..txt will download the file to the folder fly.srk.fer.hr.
  • -nH (--no-host-directories) - Do not create empty folders at the beginning of the structure. Defaults to /pub/xemacs/. If you load it with the -r parameter, it will be saved under the name ftp.. With the -nH parameter, the name of the initial folder will be cut out ftp.site/, and it will be called pub/xemacs. And the --cut-dirs parameter will remove number components.

If you just want to get rid of the folder structure, then you can replace this option with -nd and -P. Unlike -nd, -nd works with subdirectories - for example, with -nH --cut-dirs=1 subdirectory beta/ will be written down as xemacs/beta.

  • -P prefix(--directory-prefix= prefix) - Defines starting folder, in which the site’s folder structure (or just files) will be saved. By default this parameter is equal to. (current folder).

HTTP Options

  • -E (--html-extension) - If the type of the loaded file is text/html and its address does not end in \.?, using this parameter will add .html to its name. This can be useful when mirroring .asp pages if you don't want them to interfere with your Apache server. Another use case for this parameter is loading response pages of CGI scripts..cgi?25 will be saved as article.cgi?25.html. (When updating or otherwise reloading pages with this parameter, the latter will be reloaded in any case, since Wget cannot find out whether the local file is related X.html to downloadable from URL X. To avoid unnecessary reboots, use the -k and -K options. In this case, the original versions of the files will also be saved as X.orig.)
  • --http-user= user(--http-passwd= password) - Username user and password password for the HTTP server. Depending on the response type, Wget will use "basic" (insecure) or "digest" (secure) authorization. You can also specify the username and password in the URL itself.
  • -C on/off (--cache=on/off) - Enables or disables server-side caching. In this case, Wget sends the appropriate request (Pragma: no-cache). Also used to quickly update files on a proxy server. By default, caching is enabled.
  • --cookies=on/off - Enables or disables the use of cookies. The server sends a cookie to the client using the "Set-Cookie" header and the client responds with the same cookie. Thanks to this, the server can maintain visitor statistics. By default, cookies are used, but writing them to disk is disabled.
  • --load-cookies file - Load cookies from file before the first HTTP load. file has a text format like cookies.txt at Netscape. This option is used when mirroring. To do this, Wget sends the same cookies that your browser sends when connecting to an HTTP server. This is enabled by this parameter - just tell Wget the path to cookies.txt. Different browsers store cookies in different folders:

The --load-cookies option will work with cookies in the Netscape format, which is supported by Wget.

If you can't use the --load-cookies option, there is still a way out. If your browser supports Note the cookie name and value and manually tell Wget to send those cookies: wget --cookies=off --header "Cookie: I =I "

  • --save-cookies file - Save cookies from file at the end of the session. Legacy cookies are not stored.
  • --ignore-length - Some HTTP servers (CGI scripts to be precise) send "Content-Length" headers, which indicate to Wget that not everything has been downloaded yet. And Wget downloads one document multiple times. With this option, Wget will ignore "Content-Length" headers.
  • --header= additional-header - Defines additional-header, sent to the HTTP server. It must contain: and the characters after it. You can define multiple additional headers by using --header multiple times.

wget --header="Accept-Charset: iso-8859-2" --header="Accept-Language: hr" http://site/ Specifying an empty string in the header value will clear all previously user-defined headers.

  • --proxy-user= user and --proxy-passwd= password - Defines the username user and password for authorization on the proxy server. The "basic" authorization type will be used.
  • --referer= url - Adds a `Referer' header: url" in an HTTP request. Used when loading pages that are served correctly only if the server knows which page you came from.
  • -s (--save-headers) - Save headers sent to HTTP servers.
  • -U agent-string(--user-agent= agent-string) - Identify yourself as agent-string when making a request to the HTTP server. The HTTP protocol allows itself to be identified using an agent header. Wget is identified by default as Wget/ version, Where version is a version of Wget. Some servers provide the required information only for browsers identified as "Mozilla" or Microsoft "Internet Explorer". This parameter allows you to trick such servers.

FTP Settings

  • -nr (--dont-remove-listing) - Do not remove temporary files .listing, generated when uploading via FTP. These files contain information about FTP server folders. Non-deletion will help you quickly determine whether the server folders have been updated (i.e., determine that your mirror is one). If you don't delete .listing, then remember about your safety! For example, with this name you can create a symbolic link to /etc/passwd or something else.
  • -g on/off (--glob=on/off) - Enables or disables the use of special characters ( masks) via FTP protocol. This could be *, ?, [ and ]. For example:

wget ftp://site/*.msg

By default, wildcard characters are allowed if the URL contains wildcard characters. You can also put the URL in quotes. This will only work on Unix FTP servers (and emulating Unix "ls" output).

  • --passive-ftp - Enables passive FTP mode when the connection is initiated by the client. Used if there is a firewall.
  • --retr-symlinks - When recursively downloading FTP folders, the files pointed to by symbolic links are not downloaded. This option disables this. The --retr-symlinks option currently only works for files, not folders. Please note that this option does not work when uploading a single file.

Recursive Load Options

  • -r (--recursive) - Enable recursive loading.
  • -l depth(--level= depth) - Maximum recursive loading depth depth. By default its value is 5.
  • --delete-after - Delete each page (locally) after its downloads. Used to save new versions of frequently requested pages on the proxy. For example:

wget -r -nd --delete-after http://site/~popular/page/

The -r option enables downloading by default, the -nd option disables folder creation. Specifying --delete-after will ignore the --convert-links option.

  • -k (--convert-links) - Once downloading is complete, convert links in the document for offline viewing. This applies not only to visible links to other documents, but to links to all external local files. Each link is modified in one of two ways:

For example: if the downloaded file is /foo/doc.html, then the link to the also downloaded file /bar/img.gif will look like ../bar/img.gif. This method works if there is a visible relationship between the folders of one and the other file.
Links to files not downloaded by Wget will be changed to the absolute addresses of these files on the remote server.

For example: if the uploaded file /foo/doc.html contains a link to /bar/img.gif (or ../bar/img.gif), then the link in the doc.html file will change to http://host/bar/ img.gif.

Thanks to this, offline viewing of the site and files is possible: if a file to which there is a link is downloaded, the link will point to it, if not, then the link will point to its Internet address (if one exists). When converting, relative links are used, which means you can transfer the downloaded site to another folder without changing its structure. Only after the download is complete does Wget know which files have been downloaded. Therefore, with the -k option, the conversion will occur only after the download is complete.

  • -K (--backup-converted) - Convert links back - remove the .orig extension. Changes the behavior of the -N option.
  • -m (--mirror) - Enable options for mirroring sites. This option is equal to several options: -r -N -l inf -nr. For unpretentious storage of mirror copies of sites, you can use this option.
  • -p (--page-requisites) - Load all files needed to display HTML pages. For example: drawings, sound, cascading styles.

By default, such files are not downloaded. The -r and -l options specified together can help, but... Wget does not distinguish between external and internal documents, so there is no guarantee that everything required will be downloaded.

Options to disable/enable recursive loading

  • -A acclist(--accept acclist) - A comma separated list of file names to download. It is allowed to specify file names by mask.
  • -R rejlist(--reject rejlist) - A comma separated list of file names that should not be downloaded. It is allowed to specify file names by mask.
  • -D domain-list(--domains= domain-list) - List of domains domain-list, from which you are allowed to download files. Separated by commas. This option Not includes -H.
  • --exclude-domains domain-list - List of domains from which Not allowed to upload files
  • --follow-ftp - Follow FTP links from HTML pages. Otherwise, links to files via the FTP protocol are ignored.
  • --follow-tags= list - Wget has a built-in table of HTML tags in which it looks for links to other files. You can specify additional tags in a comma separated list list in this parameter.
  • -G list (--ignore-tags= list) - Back --follow-tags. To skip HTML tags when loading recursively, specify them in a comma-separated list list.
  • -H (--span-hosts) - Allows visiting any servers to which there is a link.
  • -L (--relative) - Follow relative links only. With this option, files from other servers will definitely not be downloaded.
  • -I list (--include-directories= list) - A comma-separated list of folders from which files can be downloaded. List elements list
  • -X list (--exclude-directories= list) - A comma-separated list of folders to exclude from downloading (see Folder Restrictions). List elements list may contain mask characters.
  • -np (--no-parent) - Do not go above the starting address when loading recursively.

EXAMPLES OF USING

  • Download URL:

wget http://site/

  • Upload the file by increasing the number of attempts to 60 attempts to establish a new connection (default 20):

wget --tries=60 http://site/jpg/flyweb.jpg

  • Run Wget in the background and save messages to a log log. (The ampersand at the end tells the shell to continue without waiting for Wget to finish. To keep the program repeating forever, use -t inf.)

wget -t 45 -o log http://site/jpg/flyweb.jpg &

  • Upload file via FTP:
  • If you specify the address of a folder, Wget will download the listing for that folder (that is, the files and subdirectories contained in it) and convert it to HTML format. For example:

wget ftp://site/pub/gnu/links index.html

  • If you have a file with URLs that you want to load, then use the -i parameter (If you specify - instead of a file name, the URLs will be read from standard input (stdin)):

wget -i I

  • Create a five-level copy of the GNU site with the original folder structure, with one download attempt, save messages to gnulog:

wget -r http://www.gnu.org/ -o gnulog

  • As in the example above, but with the links in HTML files converted to local ones for later offline viewing:

wget --convert-links -r http://www.gnu.org/ -o gnulog

  • Upload one HTML page and all files required to display it (eg pictures, cascading style files, etc.). Also convert all links to these files:

wget -p --convert-links http://www.server.com/dir/page.html

  • The HTML page will be saved in www.server.com/dir/page.html and drawings, cascading styles, etc. will be saved in the folder www.server.com/, except when files will be downloaded from other servers. As in the example above, but without the folder www.server.com/. Also All files will be saved in subfolders download/.

wget -p --convert-links -nH -nd -Pdownload http://www.server.com/dir/page.html

  • Load index.html from www.lycos.com, displaying the server headers:

wget -S http://www.lycos.com/

  • Save headers to a file for later use.

wget -s http://www.lycos.com/more index.html

  • Load the top two levels of wuarchive.wustl.edu into /tmp.

wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/

  • Upload GIF folder files to the HTTP server. The wget http://www.server.com/dir/*.gif command will not work because escape characters are not supported in HTTP downloads. Use:

wget -r -l1 --no-parent -A.gif http://www.server.com/dir/

R -l1 enables recursive loading with a maximum depth of 1. --no-parent disables following links to the top-level parent folder, -A.gif allows loading only files with the .GIF extension. -A ``*.gif"" will also work.

  • Let's assume that during a recursive boot you urgently needed to shut down/restart your computer. To avoid downloading existing files, use:

wget -nc -r http://www.gnu.org/

  • If you want to provide a username and password for an HTTP or FTP server, use the appropriate URL syntax:

[email protected]/.emacs">ftp://hniksic: [email protected]/.emacs

  • Do you want downloaded documents to go to standard output rather than files?
  • If you want to set up a conveyor and load all the sites whose links are listed on one page

wget -O - http://cool.list.com/ | wget --force-html -i -

  • To store a mirror of a page (or FTP folder), use --mirror (-m), which replaces -r -l inf -N. You can add Wget to your crontab with a request to check for updates every Sunday:

crontab 0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog

  • You also want the links to convert to local ones. But after reading this tutorial, you know that time comparison will not work. Tell Wget to leave backup copies of HTML files before converting. Team:

wget --mirror --convert-links --backup-converted http://www.gnu.org/ -o /home/me/weeklog

  • What if local viewing of HTML files with an extension other than .html does not work, for example index.cgi, then you need to pass a command to rename all such files (content-type = text/html) to name.html.

wget --mirror --convert-links --backup-converted --html-extension -o /home/me/weeklog http://www.gnu.org/

  • With brief analogues of commands:

wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog







2024 gtavrl.ru.