Downloading a web page and all of its resource files in Python

Tags:

I want to be able to download a page and all of its associated resources (images, style sheets, script files, etc) using Python. I am (somewhat) familiar with urllib2 and know how to download individual urls, but before I go and start hacking at BeautifulSoup + urllib2 I wanted to be sure that there wasn't already a Python equivalent to "wget --page-requisites http://www.google.com".

Specifically I am interested in gathering statistical information about how long it takes to download an entire web page, including all resources.

Thanks Mark

490

asked May 09 '09 21:05

Mark Ransom

2 Answers

Websucker? See http://effbot.org/zone/websucker.htm

answered Oct 18 '22 20:10

RichieHindle

websucker.py doesn't import css links. HTTrack.com is not python, it's C/C++, but it's a good, maintained, utility for downloading a website for offline browsing.

http://www.mail-archive.com/[email protected]/msg13523.html [issue1124] Webchecker not parsing css "@import url"

Guido> This is essentially unsupported and unmaintaned example code. Feel free to submit a patch though!

answered Oct 18 '22 21:10

jamshid

Related questions
                            
                                TF 2.0: Where can I find the upgrade of tf.contrib.training?
                            
                                how to fix "cannot import name 'imresize' error while this function importing from scipy.misc?
                            
                                Tensorflow: create tf.NodeDef() and set attributes
                            
                                Caveats while checking dtype in pandas DataFrame
                            
                                Not able to get real time error in Visual code during python development
                            
                                Why I am getting DatasetV1Adapter return type instead of TensorSliceDataset for tf.data.Dataset.from_tensor_slices(X)
                            
                                Unable to read keystore file from pyspark
                            
                                Correct way to use custom weight maps in unet architecture
                            
                                How can I fix this pytorch error on Windows? (ModuleNotFoundError: No module named 'torch')
                            
                                How to setup a grammar that can handle ambiguity
                            
                                Retrieving text body of answers and comments using Stackexchange API
                            
                                Property Setter for Subclass of Pandas DataFrame
                            
                                Unable to clear pexpect buffer in python3.X
                            
                                Pass function and arguments from node to python, using child_process
                            
                                Why is it that `input_shape` does not include the batch dimension when passed as an argument to the `Dense` layer?
                            
                                How to use the PyTorch Transformer with multi-dimensional sequence-to-seqence?
                            
                                Explanation of boolean indexing behaviors
                            
                                specify number of spaces between pandas DataFrame columns when printing
                            
                                How to fix errors occurring on installation of Jupyter Notebook?
                            
                                How to setup Django permissions to be specific to a certain model's instances?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Downloading a web page and all of its resource files in Python

Tags:

python

urllib2

wget

Mark Ransom

People also ask

2 Answers

RichieHindle

jamshid

Recent Activity

Donate For Us