i want get webpage resource content use python via Chrome Debugging Protocol,from this page method-getResourceContent,i noticed this method:getResourceContent,need params frameId and url.i think this method is what i need. so i did this thing:
1.get start chrome as a server: .\chrome.exe --remote-debugging-port=9222
2.write python test code:
# coding=utf-8
"""
chrome --remote-debugging api test
"""
import json
import requests
import websocket
import pdb
def send():
geturl = requests.get('http://localhost:9222/json')
websocketURL = json.loads(geturl.content)[0]['webSocketDebuggerUrl']
request = {}
request['id'] = 1
request['method'] = 'Page.navigate'
request['params'] = {"url": 'http://global.bing.com'}
ws = websocket.create_connection(websocketURL)
ws.send(json.dumps(request))
res = ws.recv()
ws.close()
print res
frameId = json.loads(res)['result']['frameId']
print frameId
geturl = requests.get('http://localhost:9222/json')
websocketURL = json.loads(geturl.content)[0]['webSocketDebuggerUrl']
req = {}
req['id'] = 1
req['method'] = 'Page.getResourceContent'
req['params'] = {"frameId":frameId,"url": 'http://global.bing.com'}
header = ["User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"]
pdb.set_trace()
ws = websocket.create_connection(websocketURL,header=header)
ws.send(json.dumps(req))
ress = ws.recv()
ws.close()
print ress
if __name__ == '__main__':
send()
3.Page.navigate work fine,i got something like this: {"id":1,"result":{"frameId":"8504.2"}}
4.when i try method:getResourceContent,error came out: {"error":{"code":-32000,"message":"Agent is not enabled."},"id":1}
i tried to add User-Agent,still not work.
Thanks.
The error message "Agent is not enabled" has nothing to do with the HTTP User-Agent
header but refers to an agent within chrome that needs to be enabled in order to retrieve page contents.
The term "agent" is a bit misleading since the protocol documentation speaks about domains which need to be enabled in order to debug them (the term "agent" refers to the way this is implemented in Chrome internally, I suppose)
So, the question is which domain does need to be enabled in order to access the page contents? In hindsight it is quite obvious: the Page
domain needs to be enabled as we are calling a method in this domain. I only found this out after stumbling over this example, though.
Once I added the Page.enable
request to script to activate the Page
domain, the error message disappeared. However, I encountered two other problems:
Page.getResourceContent
to fail to retrieve the resource because the requested resource http://global.bing.com/ is not available.After fixing these issues I was able to retrieve the page content. This is my code:
# coding=utf-8
"""
chrome --remote-debugging api test
"""
import json
import requests
import websocket
def send():
# Setup websocket connection:
geturl = requests.get('http://localhost:9222/json')
websocketURL = json.loads(geturl.content)[0]['webSocketDebuggerUrl']
ws = websocket.create_connection(websocketURL)
# Navigate to global.bing.com:
request = {}
request['id'] = 1
request['method'] = 'Page.navigate'
request['params'] = {"url": 'http://global.bing.com'}
ws.send(json.dumps(request))
result = ws.recv()
print "Page.navigate: ", result
frameId = json.loads(result)['result']['frameId']
# Enable page agent:
request = {}
request['id'] = 1
request['method'] = 'Page.enable'
request['params'] = {}
ws.send(json.dumps(request))
print 'Page.enable: ', ws.recv()
# Retrieve resource contents:
request = {}
request['id'] = 1
request['method'] = 'Page.getResourceContent'
request['params'] = {"frameId": frameId, "url": 'http://www.bing.com'}
ws.send(json.dumps(request))
result = ws.recv()
print("Page.getResourceContent: ", result)
# Close websocket connection
ws.close()
if __name__ == '__main__':
send()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With