wget on Windows
Overview
This is to document my steps to download all image (JPG) files along with PDF and regular HTML files instead of using the web browser, using only 1 command (wget).
Installation
Use Choco (https://chocolatey.org/). Follow installation instructions @ https://chocolatey.org/install
Then open a command prompt with administrative rights to install wget:
choco install wget
Usage
My target website (say abc.com) is protected by BASIC authentication. I am only interested in downloading files with extensions *.jpg, *.pdf & *.html. So I will create a directory to have the files placed i.e. c:\abc. Then, just run the commands below:
cd c:\abc
wget –user-agent=”Googlebot/2.1 (+https://www.googlebot.com/bot.html)” –http-user=user123 –http-password=coder4life -A “*.jpg,*.html,*.pdf” -r https://www.abc.com/folder123/ -l=0
where
–user-agent = User agent string to let the web server of target website to know about the kind of client/browser that is connecting. If not specified the value is “wget” which some web servers may block access
–http-user = BASIC username
–http-password = BASIC password (plain text)
-A = Inclusion list to download
-r = Tells wget to recursively get files (search the website for all possible paths/files)
-l = How “deep” should wget go. Default is 5, meaning from the URL
https://www.abc.com/folder123/, wget can go until /folder123/1/2/3/4/5
and stop looking. The command above has value 0, which means “infinite” (until all possible paths are traversed)
Published on System Code Geeks with permission by Allen Chee, partner at our SCG program. See the original article here: wget on Windows Opinions expressed by System Code Geeks contributors are their own. |
Thanks for the article. As a suggested correction, where it reads –user-agent, it should be -–user-agent. The change needs is also required for -–http-user and –http-password.