I'm looking for a Postgres (actually Redshift) equivalent to Hive's parse_url(..., 'HOST').
Postgres docs say it has a URL parser as part of its full text search. This blog post has a regex which may or may not be bulletproof. What is best?
Redshift now has a REGEXP_SUBSTR function:
It searches for the regular expression in the string and returns the first substring that matches. One example of a regex to extract the host:
select REGEXP_SUBSTR(url, '[^/]+\\.[^/:]+') from my_table;
Until Redshift starts supporting the regular expression functions of PostgreSQL, if you want to get the host out of an HTTP/S URL in Redshift SQL you'll have to do something like:
select split_part(url, '/', 3) as host from my_table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With