Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using urllib and BeautifulSoup to retrieve info from web with Python

I can get the html page using urllib, and use BeautifulSoup to parse the html page, and it looks like that I have to generate file to be read from BeautifulSoup.

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file

Is there a way to call BeautifulSoup without generating file from urllib?

like image 215
prosseek Avatar asked Apr 15 '10 16:04

prosseek


People also ask

How do I extract data from Beautifulsoup?

Steps involved in web scraping: Find the URL of the webpage that you want to scrape. Select the particular elements by inspecting. Write the code to get the content of the selected elements. Store the data in the required format.


1 Answers

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

No file writing needed: Just pass in the HTML string. You can also pass the object returned from urlopen directly:

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)
like image 125
interjay Avatar answered Sep 29 '22 17:09

interjay