Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using REGEXP_EXTRACT to get domain and subdomains

I have only managed to extract the TLD of the list of websites that I have using

REGEXP_EXTRACT(Domain_name, r'(\.[^.:]*)]\.?:?[0-9]*$') AS web_tld

Example:

I have

www.example1.abc.com
www.example2.efg.123.net

I want the result

Subdomain

example1
efg

Domain

abc
123

TLD

.com
.net

EDIT: Encountered an error in my query 'Exactly one capturing group must be specified' when I use (.?([^.:]+).([^.:]+).([^.:]+):?[0-9]*$) as regex

SELECT
REGEXP_EXTRACT(Domain, r'(\.?([^.:]+)\.([^.:]+)\.([^.:]+):?[0-9]*$)'),
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;
like image 397
Jeugasce Avatar asked Jan 22 '14 02:01

Jeugasce


2 Answers

As you can only use one capturing group, I think you can actually use 3 separate regular expressions to get the values you want:

SELECT
REGEXP_EXTRACT(Domain, r'([^.:]+):?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+:?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+.[^.:]+:?[0-9]*$')
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;
like image 168
Szymon Avatar answered Oct 15 '22 22:10

Szymon


Note you may be better off using the HOST, DOMAIN, and TLD rather than custom regular expressions.

like image 20
Jordan Tigani Avatar answered Oct 15 '22 23:10

Jordan Tigani