Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper way to check for URL equality

Tags:

I have the following scenario:

URL u1 = new URL("http://www.yahoo.com/"); URL u2 = new URL("http://www.yahoo.com");  if (u1.equals(u2)) {     System.out.println("yes"); } if (u1.toURI().equals(u2.toURI())) {     System.out.println("uri equality"); } if (u1.toExternalForm().equals(u2.toExternalForm())) {     System.out.println("external form equality"); } if (u1.toURI().normalize().equals(u2.toURI().normalize())) {     System.out.println("uri normalized equality"); } 

None of these checks are succeeding. Only the path differs: u1 has a path of "/" while u2 has a path of "". Are these URLs pointing to the same resource and is there a way for me to check such a thing without opening a connection? Am I misunderstanding something fundamental about URLs?

EDIT I should state that a non hacky check is desired. Is it reasonable to say that empty path == / ? I was hoping to not have this kind of code

like image 958
NG. Avatar asked Sep 22 '10 15:09

NG.


People also ask

How to compare URLs in java?

URL sameFile() method in Java with Examples URL class is used to compare two URLs excluding the fragment part. This method returns true if both the URL are same excluding the fragment part else returns false. url1. sameFile(url2);

How do I find the path of a URL?

The getPath() function is a part of URL class. The function getPath() returns the Path name of a specified URL.


2 Answers

From the 2007 JavaOne :

The second puzzle, aptly titled "More Joys of Sets" has the user create HashMap keys that consist or several URL objects. Again, most of the audience was unable to guess the correct answer.

The important thing the audience learned here is that the URL object's equals() method is, in effect, broken. In this case, two URL objects are equal if they resolve to the same IP address and port, not just if they have equal strings. However, Bloch and Pugh point out an even more severe Achilles' Heel: the equality behavior differs depending on if you're connected to the network, where virtual addresses can resolve to the same host, or if you're not on the net, where the resolve is a blocking operation. So, as far as lessons learned, they recommend:

Don't use URL; use URI instead. URI makes no attempt to compare addresses or ports. In addition, don't use URL as a Set element or a Map key.
For API designers, the equals() method should not depend on the environment. For example, in this case, equality should not change if a computer is connected to the Internet versus standalone.


From the URI equals documentation :

For two hierarchical URIs to be considered equal, their paths must be equal and their queries must either both be undefined or else be equal.

In your case, the two path are different. one is "/" the other is "".


According to the URI RFC §6.2.3:

Implementations may use scheme-specific rules, at further processing cost, to reduce the probability of false negatives. For example, because the "http" scheme makes use of an authority component, has a default port of "80", and defines an empty path to be equivalent to "/", the following four URIs are equivalent:

 http://example.com  http://example.com/  http://example.com:/  http://example.com:80/ 

It seems that this implementation doesn't use scheme-specific rules.


Resources :

  • sun.com - Java Puzzlers Serves Up Brain Benders Galore
  • javadoc - URI.equals()
  • URI RFC
like image 52
5 revs Avatar answered Sep 19 '22 15:09

5 revs


Strictly speaking they are not equal. The optional trailing slash (/) is only a common usage but not a must. You could display different pages for

http://www.yahoo.com/foo/ 

and for

http://www.yahoo.com/foo 

It's even possible for the one you provided I believe the HTTP header could skip that slash.

like image 30
Wernight Avatar answered Sep 18 '22 15:09

Wernight