I need to parse a url to get a list of urls that link to a detail page. Then from that page I need to get all the details from that page. I need to do it this way because the detail page url is not regularly incremented and changes, but the event list page stays the same. Basically: <pre class="prettyprint"><code>example.com/events/ <a href="http://example.com/events/1">Event 1</a> <a href="http://example.com/events/2">Event 2</a> example.com/events/1 ...some detail stuff I need example.com/events/2 ...some detail stuff I need </code></pre>

<pre class="prettyprint"><code>import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen('http://yahoo.com').read() soup = BeautifulSoup(page) soup.prettify() for anchor in soup.findAll('a', href=True): print anchor['href'] </code></pre> It will give you the list of urls. Now You can iterate over those urls and parse the data. <ul> <li> <code>inner_div = soup.findAll("div", {"id": "y-shade"})</code> This is an example. You can go through the BeautifulSoup tutorials.</li> </ul>

Beautiful Soup to parse url to get another urls data

Tags:

python

html

parsing

beautifulsoup

I need to parse a url to get a list of urls that link to a detail page. Then from that page I need to get all the details from that page. I need to do it this way because the detail page url is not regularly incremented and changes, but the event list page stays the same.

Basically:

example.com/events/     <a href="http://example.com/events/1">Event 1</a>     <a href="http://example.com/events/2">Event 2</a>  example.com/events/1     ...some detail stuff I need  example.com/events/2     ...some detail stuff I need

297

asked Dec 16 '10 14:12

tim

1 Answers

import urllib2 from BeautifulSoup import BeautifulSoup  page = urllib2.urlopen('http://yahoo.com').read() soup = BeautifulSoup(page) soup.prettify() for anchor in soup.findAll('a', href=True):     print anchor['href']

It will give you the list of urls. Now You can iterate over those urls and parse the data.

inner_div = soup.findAll("div", {"id": "y-shade"}) This is an example. You can go through the BeautifulSoup tutorials.

answered Sep 27 '22 17:09

Tauquir

Related questions
                            
                                Parsing HTML to get text inside an element
                            
                                Get All Follower IDs in Twitter by Tweepy
                            
                                Finding a sum in nested list using a lambda function
                            
                                How to read mp4 video to be processed by scikit-image?
                            
                                Change xticklabels fontsize of seaborn heatmap
                            
                                Labeling edges in networkx
                            
                                ValueError: Variable rnn/basic_rnn_cell/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
                            
                                How can I list the methods in a Python 2.5 module?
                            
                                Tracking white color using python opencv
                            
                                Execute a command on Remote Machine in Python
                            
                                How to write data to Redshift that is a result of a dataframe created in Python?
                            
                                Multiprocessing: use only the physical cores?
                            
                                pymongo : delete records elegantly
                            
                                executing Python script in PHP and exchanging data between the two
                            
                                How to set a Python variable to 'undefined'?
                            
                                No module named 'winrandom' when using pycrypto
                            
                                python csv, writing headers only once
                            
                                How to use TailwindCSS with Django?
                            
                                Sweave for python
                            
                                How to get the duration of video using cv2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With