I am using the Python Requests Module to datamine a website. As part of the datamining, I have to HTTP POST a form and check if it succeeded by checking the resulting URL. My question is, after the POST, is it possible to request the server to not send the entire page? I only need to check the URL, yet my program downloads the entire page and consumes unnecessary bandwidth. The code is very simple
import requests
r = requests.post(URL, payload)
if 'keyword' in r.url:
success
fail
An easy solution, if it's implementable for you. Is to go low-level. Use socket library. For example you need to send a POST with some data in its body. I used this in my Crawler for one site.
import socket
from urllib import quote # POST body is escaped. use quote
req_header = "POST /{0} HTTP/1.1\r\nHost: www.yourtarget.com\r\nUser-Agent: For the lulz..\r\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\r\nContent-Length: {1}"
req_body = quote("data1=yourtestdata&data2=foo&data3=bar=")
req_url = "test.php"
header = req_header.format(req_url,str(len(req_body))) #plug in req_url as {0}
#and length of req_body as Content-length
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #create a socket
s.connect(("www.yourtarget.com",80)) #connect it
s.send(header+"\r\n\r\n"+body+"\r\n\r\n") # send header+ two times CR_LF + body + 2 times CR_LF to complete the request
page = ""
while True:
buf = s.recv(1024) #receive first 1024 bytes(in UTF-8 chars), this should be enought to receive the header in one try
if not buf:
break
if "\r\n\r\n" in page: # if we received the whole header(ending with 2x CRLF) break
break
page+=buf
s.close() # close the socket here. which should close the TCP connection even if data is still flowing in
# this should leave you with a header where you should find a 302 redirected and then your target URL in "Location:" header statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With