Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to wget recursively on specific TLDs?

Is it possible to recursively download files from specific TLDs with wget?

Specifically, I'm trying to download the full text of the Code of Massachusetts Regulations. The actual text of the regulations is stored in multiple files across multiple domains—so I'd like to start the recursive download from the index page, but only follow links to .gov and .us domains.

like image 289
Joe Mornin Avatar asked Dec 31 '25 18:12

Joe Mornin


1 Answers

With help from the wget documentation on spanning hosts, I was able to make this work with the -H and -D flags:

wget -r -l5 -H -D.us,.gov http://www.lawlib.state.ma.us/source/mass/cmr/index.html
like image 62
Joe Mornin Avatar answered Jan 02 '26 16:01

Joe Mornin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!