Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: order data by part of string

Tags:

sql

postgresql

I have a column name that represents a person's name in the following format:

firstname [middlename] lastname [, Sr.|Jr.]

For, example:

John Smith
John J. Smith
John J. Smith, Sr.

How can I order items by lastname?

like image 961
Dziamid Avatar asked Jan 24 '12 15:01

Dziamid


People also ask

Can we use Substr in ORDER BY clause?

You can order by a substring instead of by the entire length of a character, BYTE, or TEXT column, or of an expression returning a character string. The database server uses the substring to sort the result set.

WHERE is substring in PostgreSQL?

POSITION() function The PostgreSQL position function is used to find the location of a substring within a specified string. The substring which is to be searched. The string in which the position of the substring will be detected.

Does the order of WHERE clause matter in PostgreSQL?

Does the order of the WHERE clause matter? In PostgreSQL it is not essential, and it makes no difference.


3 Answers

A correct and faster version could look like this:

SELECT *
FROM   tbl
ORDER  BY substring(name, '([^[:space:]]+)(?:,|$)')

Or:

ORDER  BY substring(name, E'([^\\s]+)(?:,|$)')

Or even:

ORDER  BY substring(name, E'([^\\s]+)(,|$)')

Explain

[^[:space:]]+ .. first (and longest) string consisting of one or more non-whitespace characters.
(,|$) .. terminated by a comma or the end of the string.

The last two examples use escape-string syntax and the class-shorthand \s instead of the long form [[:space:]] (which loses the outer level of brackets when inside a character class).

We don't actually have to use non-capturing parenthesis (?:) after the part we want to extract, because (quoting the manual):

.. if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned.

Test

SELECT substring(name, '([^[:space:]]+)(?:,|$)')
FROM  (VALUES 
  ('John Smith')
 ,('John J. Smith')
 ,('John J. Smith, Sr.')
 ,('foo bar Smith, Jr.')
) x(name)
like image 196
Erwin Brandstetter Avatar answered Oct 21 '22 21:10

Erwin Brandstetter


SELECT *
FROM t
ORDER BY substring(name, E'^.*\\s([^\\s]+)(?=,|$)') ASC

While this should provide the sorting you are looking for, it would be a lot cheaper to store the name in multiple columns and index them based on which parts of the name you need to sort by.

like image 26
Matt Avatar answered Oct 21 '22 20:10

Matt


You should use functional index for this purpose http://www.postgresql.org/docs/7.3/static/indexes-functional.html

In your case somehow....

CREATE INDEX test1_lastname_col1_idx ON test1 (split_part(col1, ' ', 3));
SELECT * FROM test1 ORDER BY split_part(col1, ' ', 3);
like image 2
n0nSmoker Avatar answered Oct 21 '22 19:10

n0nSmoker