Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize a String to create a safe URL in Java

I'm writing a library in Java which creates the URL from a list of filenames in this way:

final String domain = "http://www.example.com/";

String filenames[] = {"Normal text","Ich weiß nicht", "L'ho inserito tra i princìpi"};

System.out.println(domain+normalize(filenames[0]);
//Prints  "http://www.example.com/Normal_text"
System.out.println(domain+normalize(filenames[1]);
//Prints  "http://www.example.com/Ich_weib_nicht"
System.out.println(domain+normalize(filenames[2]);
//Prints  "http://www.example.com/L_ho_inserito_tra_i_principi"

Exists somewhere a Java library that exposes the method normalize that I'm using in the code above?

Literature:

  • Which special characters are safe to use in url?
  • Safe characters for friendly url
like image 748
mat_boy Avatar asked Jan 12 '23 05:01

mat_boy


1 Answers

Taking the content from my previous answer here, you can use java.text.Normalizer which comes close to normalizing Strings in Java. An example of normalization would be;

Accent removal:

String accented = "árvíztűrő tükörfúrógép";
String normalized = Normalizer.normalize(accented,  Normalizer.Form.NFD);
normalized = normalized.replaceAll("[^\\p{ASCII}]", "");

System.out.println(normalized);

Gives;

arvizturo tukorfurogep
like image 156
StoopidDonut Avatar answered Jan 21 '23 11:01

StoopidDonut