Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse host out of a string in Redshift?

I'm looking for a Postgres (actually Redshift) equivalent to Hive's parse_url(..., 'HOST').

Postgres docs say it has a URL parser as part of its full text search. This blog post has a regex which may or may not be bulletproof. What is best?

like image 983
dfrankow Avatar asked Dec 01 '22 03:12

dfrankow


2 Answers

Redshift now has a REGEXP_SUBSTR function:

It searches for the regular expression in the string and returns the first substring that matches. One example of a regex to extract the host:

select REGEXP_SUBSTR(url, '[^/]+\\.[^/:]+') from my_table;
like image 50
Moemars Avatar answered Dec 20 '22 19:12

Moemars


Until Redshift starts supporting the regular expression functions of PostgreSQL, if you want to get the host out of an HTTP/S URL in Redshift SQL you'll have to do something like:

select split_part(url, '/', 3) as host from my_table
like image 38
Sim Avatar answered Dec 20 '22 19:12

Sim