Script to Extract data from web page

Question

I am looking to extract some parts of data rendered on a web page. I am able to pull the entire data from the page and save it in a text file (RAW) using the code below.

curl http://webpage -o "raw.txt"

Just wondering if there were other alternatives and advantages whatsoever.

sberry · Accepted Answer

I would use a combination of requests, and BeautifulSoup.

from bs4 import BeautifulSoup
import requests    
    
session = requests.session()    
req = session.get('http://stackoverflow.com/questions/10807081/script-to-extract-data-from-wbpage')    
doc = BeautifulSoup(req.content)    
print(doc.findAll('a', { "class" : "gp-share" }))

Gilles Quenot · Answer

cURL is a good start. A better command line will be :

curl -A "Mozilla/5.0" -L -k -b /tmp/c -c /tmp/c -s http://url.tld

because it plays with cookies, user-agent, SSL certificates and others things.

See man curl

Script to Extract data from web page

Tags:

python

Selase

2 Answers

sberry

Gilles Quenot

Recent Activity

Donate For Us

Script to Extract data from web page

Tags:

python

Selase

2 Answers

sberry

Gilles Quenot

Related questions

Recent Activity

Donate For Us