My task is crawling the google search results using headless webkit(PyQt4.QtWebkit) in python. The module was crawling the results fine using PyQt4.I should have to execute this script in amazon ec2.So,I should have to use Xvfb (no x server in ec2).
At the same time my module has to be executed in a loop.So, It was working fine for some iterations.After some looping module runs into "xvfb-run: error: Xvfb failed to start"
How it is supposed to solve?
This is my looping:
for i in range(10):
try:
query_dict["start"] = i * 10
url = base_url + ue(query_dict)
flag = True
while flag:
parsed_dict = main(url)
time.sleep(8.4)
flag = False
except:
pass
main(url) :
def main(url):
cmd = "xvfb-run python /home/shan/temp/hg_intcen/lib/webpage_scrapper.py"+" "+str(url)
print "Cmd EXE:"+ cmd
proc = subprocess.Popen(cmd,shell=True,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
proc.wait()
sys.stdout.flush()
result = proc.stdout.readlines()
print "crawled: ",result[1]
return result
webpage_scrapper will fetch all the html results using pyqt4. How to avoid the xvfb failing for looping?
Xvfb (short for X virtual framebuffer) is an in-memory display server for UNIX-like operating system (e.g., Linux). It enables you to run graphical applications without a display (e.g., browser tests on a CI server) while also having the ability to take screenshots.
You need to add --auto-servernum
parameter for xvfb-run
. Otherwise, it tries to spawn Xvfb
on the same display (by default :99
), which will fail if you already have one running.
Run like this,
xvfb-run --auto-servernum --server-num=1 python webpage_scrapper.py http://google.com
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With