Mastering GNU Wget: A Comprehensive TutorialGNU Wget is an incredibly powerful tool used for downloading files from the web. Its flexibility, capability to fetch files over various protocols, and robustness in handling interruptions make it a favorite among developers, sysadmins, and anyone who interacts with the internet through scripts. This comprehensive tutorial will guide you through the essential features of Wget, from basic commands to advanced use cases.
What is GNU Wget?
GNU Wget is a free utility designed for non-interactive downloading of files from the web. It supports HTTP, HTTPS, and FTP protocols, making it versatile for accessing a wide range of resources. Here are some of its key features:
- Resilient Downloads: Wget can resume broken downloads and handle network interruptions gracefully.
- Recursive Downloading: It’s capable of fetching entire websites or directory structures.
- User-Agent Modification: You can change the User-Agent string, enabling you to bypass restrictions on certain servers.
Installing Wget
Linux
Most Linux distributions come with Wget pre-installed, but if you need to install it:
-
Debian/Ubuntu:
sudo apt install wget -
Fedora:
sudo dnf install wget
Windows
On Windows, you can download a compiled version of Wget:
- Visit the official GNU Wget website or a trusted source.
- Download the appropriate binary.
- Add Wget to your system PATH so you can run it from the command line.
macOS
For macOS, you can use Homebrew:
brew install wget
Basic Usage of Wget
The simplest way to use Wget is to specify a URL to download:
wget http://example.com/file.zip
Command Breakdown
wget: The command to run Wget.http://example.com/file.zip: The URL of the file you wish to download.
By default, Wget saves the file in the current directory with its original filename.
Downloading Multiple Files
You can download multiple files by listing them in a text file and using the -i option:
wget -i file_list.txt
Commonly Used Options
Wget comes with several options that enhance its functionality. Below are some commonly used options.
1. Resume Downloads
To resume a partially completed download, use the -c option:
wget -c http://example.com/file.zip
2. Mirror a Website
To download an entire website, use the -m option (mirror):
wget -m http://example.com
This command will create a local copy of the website, including all necessary resources.
3. Set Output Filename
Use the -O option to save the downloaded file with a specified name:
wget -O new_name.zip http://example.com/file.zip
4. Download in the Background
To let Wget run in the background, use:
wget -b http://example.com/file.zip
This will allow you to continue using the terminal while the download proceeds.
Advanced Features
Recursive Downloads
Wget’s recursive downloading is powerful for scraping entire websites.
wget -r -l 1 http://example.com
-r: Enables recursive downloading.-l 1: Limits the recursion depth to 1.
Set User-Agent
Changing the User-Agent can be necessary for accessing certain sites:
wget --user-agent="Mozilla/5.0" http://example.com
Limit Download Speed
To limit the bandwidth used by Wget, you can use:
wget --limit-rate=200k http://example.com/file.zip
This restricts the download speed to 200 KB/s.
Handling Authentication
Basic Authentication
If you need to download files from a server that requires authentication, you can use:
wget --user=username --password=password http://example.com/file.zip
Cookies
Wget can also work with cookies to maintain a session:
wget --load-cookies cookies.txt http://example.com/file.zip
Make sure to export your cookies to a text file before running this command.
Practical Examples
Example 1: Downloading All Images from a Webpage
wget -r -l 1 -A jpeg,jpg,bmp,gif,png http://example.com/gallery
This command downloads all image files from the specified gallery webpage.
Example 2: Downloading Files with a Specific Pattern
”`bash wget -r -l 2 -A “*.pdf” http://example.com/documents
Leave a Reply