I am using below code to save an html file with a time stamp in its name:
import contextlib
import datetime
import urllib2
import lxml.html
import os
import os.path
timestamp=''
filename=''
for dirs, subdirs, files in os.walk("/home/test/Desktop/"):
for f in files:
if "_timestampedfile.html" in f.lower():
timestamp=f.split('_')[0]
filename=f
break
if timestamp is '':
timestamp=datetime.datetime.now()
with contextlib.closing(urllib2.urlopen(urllib2.Request(
"http://www.google.com",
headers={"If-Modified-Since": timestamp}))) as u:
if u.getcode() != 304:
myfile="/home/test/Desktop/"+str(datetime.datetime.now())+"_timestampedfile.html"
file(myfile, "w").write(urllib2.urlopen("http://www.google.com").read())
if os.path.isfile("/home/test/Desktop/"+filename):
os.remove("/home/test/Desktop/"+filename)
html = lxml.html.parse(myfile)
else:
html = lxml.html.parse("/home/test/Desktop/"+timestamp+"_timestampedfile.html")
links=html.xpath("//a/@href")
print u.getcode()
When I run this code every time I get the code 200 from If-Modified-since header. Where am I doing mistake? My goal here is to save and use an html file and if it is modified after last time it is accessed, html file should be overwritten.
The problem is that If-Modified-Since is supposed to be a formatted date string:
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
but you're passing in a datetime tuple.
Try something like this:
timestamp = time.time()
...
time.strftime('%a, %d %b %Y %H:%M:%S GMT', time.gmtime(timestamp))
The second reason your code isn't working as you expect:
http://www.google.com/ does not seem to honor If-modified-since. That's allowed per the RFC, and they may have various reasons for choosing that behavior.
c) If the variant has not been modified since a valid If- Modified-Since date, the server SHOULD return a 304 (Not Modified) response.
If you try http://www.stackoverflow.com/, for example, you'll see a 304. (I just tried it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With