Windows

wget on Windows

Overview

This is to document my steps to download all image (JPG) files along with PDF and regular HTML files instead of using the web browser, using only 1 command (wget).

Installation

Use Choco (https://chocolatey.org/). Follow installation instructions @ https://chocolatey.org/install

Then open a command prompt with administrative rights to install wget:

choco install wget

Usage

My target website (say abc.com) is protected by BASIC authentication. I am only interested in downloading files with extensions *.jpg, *.pdf & *.html. So I will create a directory to have the files placed i.e. c:\abc. Then, just run the commands below:

cd c:\abc 
wget –user-agent=”Googlebot/2.1 (+https://www.googlebot.com/bot.html)” –http-user=user123 –http-password=coder4life -A “*.jpg,*.html,*.pdf” -r https://www.abc.com/folder123/ -l=0

where

–user-agent = User agent string to let the web server of target website to know about the kind of client/browser that is connecting. If not specified the value is “wget” which some web servers may block access

–http-user = BASIC username

–http-password = BASIC password (plain text)

-A = Inclusion list to download

-r = Tells wget to recursively get files (search the website for all possible paths/files)

-l = How “deep” should wget go. Default is 5, meaning from the URL 
https://www.abc.com/folder123/, wget can go until  /folder123/1/2/3/4/5
and stop looking. The command above has value 0, which means “infinite” (until all possible paths are traversed)

Published on System Code Geeks with permission by Allen Chee, partner at our SCG program. See the original article here: wget on Windows

Opinions expressed by System Code Geeks contributors are their own.

Allen Chee

Allen is a software developer working in the banking domain. Apart from hacking code and tinkering with technology, he reads a lot about history, so that mistakes of the past need not be repeated if they are remembered.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Rod
Rod
5 years ago

Thanks for the article. As a suggested correction, where it reads –user-agent, it should be -–user-agent. The change needs is also required for -–http-user and –http-password.

Back to top button