I have only managed to extract the TLD of the list of websites that I have using
REGEXP_EXTRACT(Domain_name, r'(\.[^.:]*)]\.?:?[0-9]*$') AS web_tld
Example:
I have
www.example1.abc.com
www.example2.efg.123.net
I want the result
Subdomain
example1
efg
Domain
abc
123
TLD
.com
.net
EDIT: Encountered an error in my query 'Exactly one capturing group must be specified' when I use (.?([^.:]+).([^.:]+).([^.:]+):?[0-9]*$) as regex
SELECT
REGEXP_EXTRACT(Domain, r'(\.?([^.:]+)\.([^.:]+)\.([^.:]+):?[0-9]*$)'),
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;
As you can only use one capturing group, I think you can actually use 3 separate regular expressions to get the values you want:
SELECT
REGEXP_EXTRACT(Domain, r'([^.:]+):?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+:?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+.[^.:]+:?[0-9]*$')
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;
Note you may be better off using the HOST, DOMAIN, and TLD rather than custom regular expressions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With