Is it possible to recursively download files from specific TLDs with wget?
Specifically, I'm trying to download the full text of the Code of Massachusetts Regulations. The actual text of the regulations is stored in multiple files across multiple domains—so I'd like to start the recursive download from the index page, but only follow links to .gov and .us domains.
With help from the wget documentation on spanning hosts, I was able to make this work with the -H and -D flags:
wget -r -l5 -H -D.us,.gov http://www.lawlib.state.ma.us/source/mass/cmr/index.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With