Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for PostgreSQL for getting domain with sub-domain from URL/Website

Basically, I need to get those rows which contain domain and subdomain name from a URL or the whole website name excluding www.

My DB table looks like this:

+----------+------------------------+
|    id    |    website             |
+----------+------------------------+
| 1        | https://www.google.com |
+----------+------------------------+
| 2        | http://www.google.co.in|
+----------+------------------------+
| 3        | www.google.com         |
+----------+------------------------+
| 4        | www.google.co.in       |
+----------+------------------------+
| 5        | google.com             |
+----------+------------------------+
| 6        | google.co.in           |
+----------+------------------------+
| 7        | http://google.co.in    |
+----------+------------------------+

Expected output:

google.com
google.co.in
google.com
google.co.in
google.com
google.co.in
google.co.in

My Postgres Query looks like this:

select id, substring(website from '.*://([^/]*)') as website_domain from contacts

But above query give blank websites. So, how I can get the desired output?

like image 520
Shubham Srivastava Avatar asked Nov 28 '17 10:11

Shubham Srivastava


2 Answers

You must use the "non capturing" match ?: to cope with the non "http://" websites.

like

select 
  id, 
  substring(website from '(?:.*://)?(?:www\.)?([^/?]*)') as website_domain     
from contacts;

SQL Fiddle: http://sqlfiddle.com/#!17/f890c/2/0

PostgreSQL's regular expressions: https://www.postgresql.org/docs/9.3/functions-matching.html#POSIX-ATOMS-TABLE

like image 52
eMerzh Avatar answered Nov 14 '22 03:11

eMerzh


You may use

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tbl;

See the regex demo.

Details

  • ^ - start of string
  • (https?://)? - 1 or 0 occurrences of http:// or https://
  • (www\.)? - 1 or 0 occurrences of www.

See the PostgreSQL demo:

CREATE TABLE tb1
    (website character varying)
;

INSERT INTO tb1
    (website)
VALUES
    ('https://www.google.com'),
    ('http://www.google.co.in'),
    ('www.google.com'),
    ('www.google.co.in'),
    ('google.com'),
    ('google.co.in'),
    ('http://google.co.in')
;

SELECT REGEXP_REPLACE(website, '^(https?://)?(www\.)?', '') from tb1;

Result:

enter image description here

like image 21
Wiktor Stribiżew Avatar answered Nov 14 '22 05:11

Wiktor Stribiżew