Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StringUtil indexOf() equivalent postgreSQL query

I need to implement stringUtils Class indexOf() method in postgresql.

Lets say I have a table in which url is one of the column.

url : "http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit"

My requirement is to find the index of the 3rd occurence of '/' in the above url and do substring and take only paypal-info.com host name in Postgresql Query

Any idea on implementing this would be grateful. Thanks

like image 547
Sri Avatar asked Aug 22 '14 08:08

Sri


3 Answers

Have you tried split_part method?

SELECT split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 3)

Result:

split_part
paypal-info.com

For other string functions try this doc: http://www.postgresql.org/docs/9.1/static/functions-string.html

Edit: as for indexOf itself I don't know any built-in postgres solution. But using two string functions You can achieve it like this:

SELECT strpos('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 4)) - 1 as index_of;
like image 128
Michał Schielmann Avatar answered Oct 31 '22 09:10

Michał Schielmann


The string functions and operators section of the manual is the equivalent of String.indexOf, e.g.

select position('/' in 'http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');

however it doesn't offer the option to get the n'th occurrence.

You're really approaching this all wrong. You should use proper URL parsing code to extract the host portion, not attempt to roll your own or use regex / splitting / string mangling.

PostgreSQL doesn't have a native URL/URI type, but its procedural languages do and it's trivial to wrap suitable functions. e.g. with PL/Python:

create language plpythonu;
create or replace function urlhost(url text) returns text 
language plpythonu 
immutable strict
as $$
import urlparse
return urlparse.urlparse(url).netloc
$$;

then:

regress=# select urlhost('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
     urlhost     
-----------------
 paypal-info.com
(1 row)

If you'd prefer to use PL/Perl, PL/V8, or whatever, that's fine.

For best performance, you could write a simple C function and expose that as an extension.

like image 36
Craig Ringer Avatar answered Oct 31 '22 10:10

Craig Ringer


Just replace 3 with N to get the index of the Nth '/' in a given string

SELECT length(substring('http://asd/asd', '(([^/]*/){3})')) - 1

To extract the host name from url you can use

SELECT substring('http://asd.com:234/qwe', 'http://([^:]+).*/')

Tested here: SQLFiddle

like image 42
Vlad Avatar answered Oct 31 '22 11:10

Vlad