Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

header If-Modified-Since does not give 304 code

Tags:

python

I am using below code to save an html file with a time stamp in its name:

import contextlib
import datetime
import urllib2
import lxml.html
import os
import os.path
timestamp=''
filename=''
for dirs, subdirs, files in os.walk("/home/test/Desktop/"):
    for f in files:
        if "_timestampedfile.html" in f.lower():
            timestamp=f.split('_')[0]
            filename=f
            break
if timestamp is '': 
    timestamp=datetime.datetime.now()

with contextlib.closing(urllib2.urlopen(urllib2.Request(
        "http://www.google.com",
        headers={"If-Modified-Since": timestamp}))) as u:
    if u.getcode() != 304:
        myfile="/home/test/Desktop/"+str(datetime.datetime.now())+"_timestampedfile.html"
        file(myfile, "w").write(urllib2.urlopen("http://www.google.com").read())
        if os.path.isfile("/home/test/Desktop/"+filename):
        os.remove("/home/test/Desktop/"+filename)
        html = lxml.html.parse(myfile)
    else:
        html = lxml.html.parse("/home/test/Desktop/"+timestamp+"_timestampedfile.html")

links=html.xpath("//a/@href")
print u.getcode()

When I run this code every time I get the code 200 from If-Modified-since header. Where am I doing mistake? My goal here is to save and use an html file and if it is modified after last time it is accessed, html file should be overwritten.

like image 623
user2460869 Avatar asked Jun 09 '26 13:06

user2460869


1 Answers

The problem is that If-Modified-Since is supposed to be a formatted date string:

If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT

but you're passing in a datetime tuple.

Try something like this:

timestamp = time.time()
...
time.strftime('%a, %d %b %Y %H:%M:%S GMT', time.gmtime(timestamp))

The second reason your code isn't working as you expect:

http://www.google.com/ does not seem to honor If-modified-since. That's allowed per the RFC, and they may have various reasons for choosing that behavior.

  c) If the variant has not been modified since a valid If-
     Modified-Since date, the server SHOULD return a 304 (Not
     Modified) response.

If you try http://www.stackoverflow.com/, for example, you'll see a 304. (I just tried it.)

like image 51
ron rothman Avatar answered Jun 11 '26 02:06

ron rothman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!