Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Can I Implement A Standard Set of Hyperlink Detection Rules in Delphi

I currently do automatic detection of hyperlinks within text in my program. I made it very simple and only look for http:// or www.

However, a user suggested to me that I extend it to other forms, e.g.: https:// or .com

Then I realized it might not stop there because there's ftp and mailto and file, all the other top level domains, and even email addresses and file paths.

What I think is best is to limit it to what is practical by following some often-used standard set of hyperlink detection rules that are currently in use. Maybe how Microsoft Word does it, or maybe how RichEdit does it or maybe you know of a better standard.

So my question is:

Is there a built in function that I can call from Delphi to do the detection, and if so, what would the call look like? (I plan in the future to go to FireMonkey, so I would prefer something that will work beyond Windows.)

If there isn't a function available, is there some place I can find a documented set of rules of what is detected in Word, in RichEdit, or any other set of rules of what should be detected? That would then allow me to write the detection code myself.

like image 669
lkessler Avatar asked Jan 23 '12 03:01

lkessler


People also ask

How do I create a detection rule?

Create new rule and provide alert details. With the query in the query editor, select Create detection rule and specify the following alert details: Detection name —name of the detection rule Frequency —interval for running the query and taking action.

How do I create a least frequent run detection rule?

Since the least frequent run is every 24 hours, filtering for the past day will cover all new data. 2. Create new rule and provide alert details. With the query in the query editor, select Create detection rule and specify the following alert details: Frequency —interval for running the query and taking action. See additional guidance below

How do I set up a duplicate detection rule?

If you want the rule to consider only active records while detecting duplicates, select the Exclude inactive matching recordscheck box. You should also select this check box if your duplicate detection rule criteria are based on a status field. If you want the rule to be case-sensitive, select the Case-sensitivecheck box.

How do I add exceptions to an elastic Endpoint Security Rule?

Elastic endpoint exceptions (optional): Adds all Elastic Endpoint Security rule exceptions to this rule (see Rule exceptions and value lists ). If you select this option, you can add Endpoint exceptions on the Rule details page. Additionally, all future exceptions added to the Elastic Endpoint Security rule also affect this rule.


2 Answers

Try the PathIsURL function which is declarated in the ShLwApi unit.

like image 129
RRUZ Avatar answered Sep 29 '22 10:09

RRUZ


Following regex taken from RegexBuddy's library might get you started (I can't make any claims about performance).

Regex

Match; JGsoft; case insensitive:  
\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]

Explanation

URL: Find in full text The final character class makes sure that if an URL is part of some text, punctuation such as a comma or full stop after the URL is not interpreted as part of the URL.

Matches (whole or partial)

http://regexbuddy.com
http://www.regexbuddy.com 
http://www.regexbuddy.com/ 
http://www.regexbuddy.com/index.html 
http://www.regexbuddy.com/index.html?source=library 
You can download RegexBuddy at http://www.regexbuddy.com/download.html.

Does not match

regexbuddy.com
www.regexbuddy.com
"www.domain.com/quoted URL with spaces"
[email protected]

For a set of rules you might look into RFC 3986

A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource. This
specification defines the generic URI syntax and a process for
resolving URI references that might be in relative form, along with
guidelines and security considerations for the use of URIs on the
Internet

A regex that validates a URL as specified in RFC 3986 would be

^
(# Scheme
 [a-z][a-z0-9+\-.]*:
 (# Authority & path
  //
  ([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
  ([a-z0-9\-._~%]+                            # Named host
  |\[[a-f0-9:.]+\]                            # IPv6 host
  |\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
  (:[0-9]+)?                                  # Port
  (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
 |# Path without authority
  (/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 ([a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?  # Relative path
 |(/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?)                            # Absolute path
)
# Query
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
$
like image 30
Lieven Keersmaekers Avatar answered Sep 29 '22 11:09

Lieven Keersmaekers