Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the regular expression for validating jabber id?

For now I'm using this regexp:

^\A([a-z0-9\.\-_\+]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z$

I think it is not very good. So what's the best regular expression you have or have seen for validating jids?

For reference, Section 3 of the XMPP core standard defines a JID in Augmented Backus-Naur Form as

jid             = [ node "@" ] domain [ "/" resource ]
domain          = fqdn / address-literal
fqdn            = (sub-domain 1*("." sub-domain))
sub-domain      = (internationalized domain label)
address-literal = IPv4address / IPv6address
like image 747
Anton Mironov Avatar asked Dec 14 '22 02:12

Anton Mironov


2 Answers

Try this:

^(?:([^@/<>'\"]+)@)?([^@/<>'\"]+)(?:/([^<>'\"]*))?$

It's not quite right, since there are lots of things that match it that aren't valid JIDs, particularly in the domain name portion. However, it should allow and parse all valid JIDs, with group 1 being the node, group 2 being the domain, and group 3 being the resource.


Test Data:

foo                 (None,  'foo', None)
[email protected]     ('foo', 'example.com', None)
[email protected]/bar ('foo', 'example.com', 'bar')
example.com/bar     (None,  'example.com', 'bar')
example.com/bar@baz (None,  'example.com', 'bar@baz')
example.com/bar/baz (None,  'example.com', 'bar/baz')
bär@exämple.com/bäz ('bär', 'exämple.com', 'bäz')

Aside: if you aren't familiar with the construct (?:), it's a set of parens that doesn't add a group to the output.

like image 105
Joe Hildebrand Avatar answered Jan 11 '23 15:01

Joe Hildebrand


Your regexp is wrong at least in the following points:

  1. It requires jid to contain a '@', though jids without a '@' may also be valid.
  2. It doesn't check the maximal length (but the link you provided says "Each allowable portion of a JID MUST NOT be more than 1023 bytes in length")

I think having one huge regexp is a wrong way to go. You'd better write some more code, splitting the jid into smaller parts (domain, node, resource) and then checking each of those parts. That would be better from multiple points:

  • easier testing (you can unit test each of the parts independently)
  • better performance
  • simpler code
  • reusability
  • etc.
like image 28
Olexiy Avatar answered Jan 11 '23 16:01

Olexiy