I currently do automatic detection of hyperlinks within text in my program. I made it very simple and only look for http:// or www.
However, a user suggested to me that I extend it to other forms, e.g.: https:// or .com
Then I realized it might not stop there because there's ftp and mailto and file, all the other top level domains, and even email addresses and file paths.
What I think is best is to limit it to what is practical by following some often-used standard set of hyperlink detection rules that are currently in use. Maybe how Microsoft Word does it, or maybe how RichEdit does it or maybe you know of a better standard.
So my question is:
Is there a built in function that I can call from Delphi to do the detection, and if so, what would the call look like? (I plan in the future to go to FireMonkey, so I would prefer something that will work beyond Windows.)
If there isn't a function available, is there some place I can find a documented set of rules of what is detected in Word, in RichEdit, or any other set of rules of what should be detected? That would then allow me to write the detection code myself.
Create new rule and provide alert details. With the query in the query editor, select Create detection rule and specify the following alert details: Detection name —name of the detection rule Frequency —interval for running the query and taking action.
Since the least frequent run is every 24 hours, filtering for the past day will cover all new data. 2. Create new rule and provide alert details. With the query in the query editor, select Create detection rule and specify the following alert details: Frequency —interval for running the query and taking action. See additional guidance below
If you want the rule to consider only active records while detecting duplicates, select the Exclude inactive matching recordscheck box. You should also select this check box if your duplicate detection rule criteria are based on a status field. If you want the rule to be case-sensitive, select the Case-sensitivecheck box.
Elastic endpoint exceptions (optional): Adds all Elastic Endpoint Security rule exceptions to this rule (see Rule exceptions and value lists ). If you select this option, you can add Endpoint exceptions on the Rule details page. Additionally, all future exceptions added to the Elastic Endpoint Security rule also affect this rule.
Try the PathIsURL
function which is declarated in the ShLwApi
unit.
Following regex taken from RegexBuddy's library might get you started (I can't make any claims about performance).
Regex
Match; JGsoft; case insensitive:
\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]
Explanation
URL: Find in full text The final character class makes sure that if an URL is part of some text, punctuation such as a comma or full stop after the URL is not interpreted as part of the URL.
Matches (whole or partial)
http://regexbuddy.com
http://www.regexbuddy.com
http://www.regexbuddy.com/
http://www.regexbuddy.com/index.html
http://www.regexbuddy.com/index.html?source=library
You can download RegexBuddy at http://www.regexbuddy.com/download.html.
Does not match
regexbuddy.com
www.regexbuddy.com
"www.domain.com/quoted URL with spaces"
[email protected]
For a set of rules you might look into RFC 3986
A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource. This
specification defines the generic URI syntax and a process for
resolving URI references that might be in relative form, along with
guidelines and security considerations for the use of URIs on the
Internet
A regex that validates a URL as specified in RFC 3986 would be
^
(# Scheme
[a-z][a-z0-9+\-.]*:
(# Authority & path
//
([a-z0-9\-._~%!$&'()*+,;=]+@)? # User
([a-z0-9\-._~%]+ # Named host
|\[[a-f0-9:.]+\] # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]) # IPvFuture host
(:[0-9]+)? # Port
(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/? # Path
|# Path without authority
(/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
)
|# Relative URL (no scheme or authority)
([a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/? # Relative path
|(/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?) # Absolute path
)
# Query
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With