Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GROUP domain from url in MySql

Tags:

regex

sql

mysql

I have a large database that contains many urls, there are many domains repeating and i;m trying to get only the domain. eg:

http://example.com/someurl.html
http://example.com/someurl_on_the_same_domain.html
http://example.net/myurl.php
http://example.org/anotherurl.php

and i want to get only domains, eg:

http://example.com
http://example.net
http://example.org

My query is:

SELECT id, site FROM table GROUP BY site ORDER BY id DESC LIMIT 50

I need to use regex i think but i'm not mysql guru.

like image 500
eben Avatar asked Dec 02 '22 05:12

eben


2 Answers

SELECT
    SUBSTR(site, 1 , LOCATE('/', site, 8)-1)
        as OnlyDomain
    FROM table
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50

[EDIT] : After OP request, here's the updated answer that will show correct results even if domain names does not have trailing slashes:

SELECT
    SUBSTR(site, 1 , IF(LOCATE('/', site, 8), LOCATE('/', site, 8)-1, LENGTH(site)))
        as OnlyDomain
    FROM tablename
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50
like image 112
shamittomar Avatar answered Dec 03 '22 23:12

shamittomar


SELECT 
COUNT(*) AS nCount,
SUBSTRING_INDEX(REPLACE(REPLACE(REPLACE(site,'http://',''),'https://',''),'www.',''),'/',1) AS sDomain 
FROM tbl_table
GROUP BY sDomain 
ORDER BY 1 DESC

Addon after JQman sollution with also the www. replaced and the group by + count

like image 26
Dennis de Jong Avatar answered Dec 03 '22 23:12

Dennis de Jong