I'm trying to extract SEO friendly URLs from strings that can contain special characters, letter with accents, Chinese like characters, etc.
SO is doing this and it's translating this post title in
java-and-seo-friendly-urls-reate--a-valid-http-url-from-a-string-composed-by-s
I'm trying to do this in Java.
I'm using this post solution with URLEncoder.encode to translate Chinese and other symbols into valid URL characters.
Have you ever implemented something like this? Is there a better way?
In your Java program, you can use a String containing this text to create a URL object: URL myURL = new URL("http://example.com/"); The URL object created above represents an absolute URL. An absolute URL contains all of the information necessary to reach the resource in question.
Use lowercase letters and standard characters SEO-friendly URLs support Google's guidelines for readability. That's why creating URLs that use lowercase letters and standard characters is a best practice for improving search engine rankings.
Friendly URLs are called Aliases in Sitecore. The benefit of creating a friendly URL is that they are easier to remember and contain key words about the web page. Example: Original URL: https://portal.ct.gov/Services/Education/Higher-Education/Higher-Education-Information-and-Resources.
This might be an oversimplistic approach to the problem, but you could just use regular expressions to remove all non standard characters. So after converting your string to lowercase, you can replace all non lowercase alphabetic characters with an empty character and then replace all spaces with the '-' character.
private static String encodeForUrl(String input) {
return input.toLowerCase().replaceAll("[^a-z\\s]", "").replaceAll("\\s", "-");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With