Download web pages recursively under an URL1
wget \
--recursive \
--no-clobber \
--page-requisites \
--adjust-extension \
--convert-links \
--restrict-file-names=windows \
--domains example.com \
-nH --cut-dirs=some_subdir \
-e robots=off \
--random-wait \
--wait 5 \
--no-parent \
www.example.com/subdirectory/
- Substitute
example.com and
www.example.com/subdirectory/ with relevant
expressions in your problem.
-r --recursive: download the entire
Web site.
-D --domains website.org: don't
follow links outside website.org.
-np --no-parent: don't follow links
outside the directory subdirectory.
-p --page-requisites: get all the
elements that compose the page (images, CSS and so on).
-E `--adjust-xtension.
-k --convert-links: convert links so
that they work locally, off-line.
--restrict-file-names=windows: modify filenames so
that they will work in Windows as well.
-nc --no-clobber: don't overwrite any
existing files (used in case the download is interrupted and
resumed).
-e robots=off: force crawling regardless of
robots.txt setting.
-nH --cut-dirs=some_subdir: cuts out hostname and
subdirectory name.
--random-wait: randomizes the time between
requests to vary between 0.5 and 1.5 times of the waiting time
specified by the --wait option.
-w --wait=5: number of seconds to
wait between requests. (See --random-wait.)
References