This is a easy question,but I just don't get it. I want to detect url in a string and replace them with a shorten one.
I found this expression from stackoverflow,But the result is just http
Pattern p = Pattern.compile("\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]",Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(str); boolean result = m.find(); while (result) { for (int i = 1; i <= m.groupCount(); i++) { String url=m.group(i); str = str.replace(url, shorten(url)); } result = m.find(); } return html;
Is there any better idea?
php use VStelmakh\UrlHighlight\UrlHighlight; $urlHighlight = new UrlHighlight(); // Extract urls $urlHighlight->getUrls("This is example http://example.com."); // return: ['http://example.com'] // Make urls as hyperlinks $urlHighlight->highlightUrls('Hello, http://example.com.
URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.
In Java, this can be done by using Pattern. matcher(). Find the substring from the first index of match result to the last index of the match result and add this substring into the list.
Let me go ahead and preface this by saying that I'm not a huge advocate of regex for complex cases. Trying to write the perfect expression for something like this is very difficult. That said, I do happen to have one for detecting URL's and it's backed by a 350 line unit test case class that passes. Someone started with a simple regex and over the years we've grown the expression and test cases to handle the issues we've found. It's definitely not trivial:
// Pattern for recognizing a URL, based off RFC 3986 private static final Pattern urlPattern = Pattern.compile( "(?:^|[\\W])((ht|f)tp(s?):\\/\\/|www\\.)" + "(([\\w\\-]+\\.){1,}?([\\w\\-.~]+\\/?)*" + "[\\p{Alnum}.,%_=?&#\\-+()\\[\\]\\*$~@!:/{};']*)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Here's an example of using it:
Matcher matcher = urlPattern.matcher("foo bar http://example.com baz"); while (matcher.find()) { int matchStart = matcher.start(1); int matchEnd = matcher.end(); // now you have the offsets of a URL match }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With