Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the Last Modified date of an URL

Tags:

I have three code. This is the first one in which I get the metadata information of any url and in that metadata I have LastModified date also. If I run this class then I get last modified date of url as--

key:- Last-Modified value:- 2011-10-21T03:18:28Z 

First one

public class App {      private static Map<String, String> metaData;      public static void main(String[] args) {          Tika t = new Tika();          Metadata md = new Metadata();         URL u = null;         try {             u = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");              String content1= t.parseToString(u);             System.out.println("hello" +content1);         } catch (MalformedURLException e1) {             // TODO Auto-generated catch block             e1.printStackTrace();         } catch (IOException e) {             // TODO Auto-generated catch block             e.printStackTrace();         } catch (TikaException e) {             // TODO Auto-generated catch block             e.printStackTrace();         }         try {             Reader r = t.parse(u.openStream(), md);         } catch (IOException e) {             // TODO Auto-generated catch block             e.printStackTrace();         }         try {         for (String name : md.names()){             String value = md.get(name);             System.out.println("key:- " +name);             System.out.println("value:- " +value);             //getMetaData().put(name.toLowerCase(), md.get(name));         }         }         catch(Exception e) {             e.printStackTrace();         }      }  } 

But for second example just below this when I run this code and with the same url. I get different Last Modified date of that URL. How to make sure which one is right. As I tried opening that pdf in the browser but instead of getting open in the browser. it is getting open with Adobe PDF on the computer not on the browser so I am not able to check through firebug.

Second Way--

public class LastMod{   public static void main(String args[]) throws Exception {     URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");      System.out.println("URL:- " +url);     URLConnection connection = url.openConnection();       System.out.println(connection.getHeaderField("Last-Modified"));     } } 

For the above one I get Las Mod date as-

Thu, 03 Nov 2011 16:59:41 +0000 

Third Way--

public class Main{   public static void main(String args[]) throws Exception {     URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");     HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();      long date = httpCon.getLastModified();     if (date == 0)       System.out.println("No last-modified information.");     else       System.out.println("Last-Modified: " + new Date(date));   } } 

And by third method I get it like this--

Last-Modified: Thu Nov 03 09:59:41 PDT 2011 

I am confuse which one is right. I think first one is right. Any suggestions will be appreciated..

like image 704
arsenal Avatar asked Nov 03 '11 17:11

arsenal


People also ask

How do I get the last modified date in HTML?

The DOM lastModified property in HTML is used to return the date and time of the current document that was last modified. This property is read-only. This property returns a string which contains the date and time when the document was last modified.

What is date last modified?

The "Last Time Modified" date refers to the last time a document or media file was modified. This information is gathered from metadata within the document or from the website's servers. Last Time Modified date can be viewed under Documents in the Inventory module of Quality Assurance.

How do I find out when an article was published?

Look at the Byline It'll usually be near the top of the post, perhaps alongside the author's byline. Some publications may place it at the end of the article. If you're lucky, the page will display two dates: one of the original publication and the second of when it was updated (if applicable).


2 Answers

The best option is the third one - connection.getLastModified(), because it is the most easy-to-use method and has the highest level of abstraction. All the rest are on lower levels of abstraction: the first reads the raw response, and the second reads the raw header. The third reads the header and converts it to long.

The difference between the outputs is due to the timezone. Using new Date() you use the VM default timezone. Prefer Calendar, or best - joda-time DateTime which support custom time zones.

like image 67
Bozho Avatar answered Oct 27 '22 15:10

Bozho


The first piece of code extracts the date from the metadata of the PDF file, while the two other ones get the information from the HTTP headers returned by the Web server. The first one is probably more accurate if you want to know when the document was created/modified.

like image 36
Andreas Veithen Avatar answered Oct 27 '22 14:10

Andreas Veithen