Trim trailing spaces with PostgreSQL

Tags:

I have a column eventDate which contains trailing spaces. I am trying to remove them with the PostgreSQL function TRIM(). More specifically, I am running:

SELECT TRIM(both ' ' from eventDate)  FROM EventDates;

However, the trailing spaces don't go away. Furthermore, when I try and trim another character from the date (such as a number), it doesn't trim either. If I'm reading the manual correctly this should work. Any thoughts?

939

asked Mar 27 '14 21:03

zgall1

1 Answers

There are many different invisible characters. Many of them have the property WSpace=Y ("whitespace") in Unicode. But some special characters are not considered "whitespace" and still have no visible representation. The excellent Wikipedia articles about space (punctuation) and whitespace characters should give you an idea.

<rant>Unicode sucks in this regard: introducing lots of exotic characters that mainly serve to confuse people.</rant>

The standard SQL trim() function by default only trims the basic Latin space character (Unicode: U+0020 / ASCII 32). Same with the rtrim() and ltrim() variants. Your call also only targets that particular character.

Use regular expressions with regexp_replace() instead.

Trailing

To remove all trailing white space (but not white space inside the string):

SELECT regexp_replace(eventdate, '\s+$', '') FROM eventdates;

The regular expression explained:
\s ... regular expression class shorthand for [[:space:]]
- which is the set of white-space characters - see limitations below
+ ... 1 or more consecutive matches
$ ... end of string

Demo:

SELECT regexp_replace('inner white   ', '\s+$', '') || '|'

Returns:

inner white|

Yes, that's a single backslash (\). Details in this related answer:

SQL select where column begins with \

Leading

To remove all leading white space (but not white space inside the string):

regexp_replace(eventdate, '^\s+', '')

^ .. start of string

Both

To remove both, you can chain above function calls:

regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '')

Or you can combine both in a single call with two branches.
Add 'g' as 4th parameter to replace all matches, not just the first:

regexp_replace(eventdate, '^\s+|\s+$', '', 'g')

But that should typically be faster with substring():

substring(eventdate, '\S(?:.*\S)*')

\S ... everything but white space
(?:re) ... non-capturing set of parentheses
.* ... any string of 0-n characters

Or one of these:

substring(eventdate, '^\s*(.*\S)') substring(eventdate, '(\S.*\S)')  -- only works for 2+ printing characters

(re) ... Capturing set of parentheses

Effectively takes the first non-whitespace character and everything up to the last non-whitespace character if available.

Whitespace?

There are a few more related characters which are not classified as "whitespace" in Unicode - so not contained in the character class [[:space:]].

These print as invisible glyphs in pgAdmin for me: "mongolian vowel", "zero width space", "zero width non-joiner", "zero width joiner":

SELECT E'\u180e', E'\u200B', E'\u200C', E'\u200D';  '᠎' | '' | '‌' | '‍'

Two more, printing as visible glyphs in pgAdmin, but invisible in my browser: "word joiner", "zero width non-breaking space":

SELECT E'\u2060', E'\uFEFF'; '⁠' | ''

Ultimately, whether characters are rendered invisible or not also depends on the font used for display.

To remove all of these as well, replace '\s' with '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]' or '[\s᠎‌‍⁠]' (note trailing invisible characters!).
Example, instead of:

regexp_replace(eventdate, '\s+$', '')

use:

regexp_replace(eventdate, '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]+$', '')

or:

regexp_replace(eventdate, '[\s᠎‌‍⁠]+$', '')  -- note invisible characters

Limitations

There is also the Posix character class [[:graph:]] supposed to represent "visible characters". Example:

substring(eventdate, '([[:graph:]].*[[:graph:]])')

It works reliably for ASCII characters in every setup (where it boils down to [\x21-\x7E]), but beyond that you currently (incl. pg 10) depend on information provided by the underlying OS (to define ctype) and possibly locale settings.

Strictly speaking, that's the case for every reference to a character class, but there seems to be more disagreement with the less commonly used ones like graph. But you may have to add more characters to the character class [[:space:]] (shorthand \s) to catch all whitespace characters. Like: \u2007, \u202f and \u00a0 seem to also be missing for @XiCoN JFS.

The manual:

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others.

Bold emphasis mine.

Also note this limitation that was fixed with Postgres 10:

Fix regular expressions' character class handling for large character codes, particularly Unicode characters above U+7FF (Tom Lane)

Previously, such characters were never recognized as belonging to locale-dependent character classes such as [[:alpha:]].

answered Oct 02 '22 11:10

Erwin Brandstetter

Related questions
                            
                                How do I execute a stored procedure in a SQL Agent job?
                            
                                How to run a stored procedure in sql server every hour?
                            
                                basic recursive query on sqlite3?
                            
                                Date Difference between consecutive rows
                            
                                Foreign key referencing a 2 columns primary key in SQL Server
                            
                                How can I assign inserted output value to a variable in sql server? [duplicate]
                            
                                How to pivot in SQLite or i.e. select in wide format a table stored in long format?
                            
                                Best way to store working hours and query it efficiently
                            
                                WHERE Clause vs ON when using JOIN
                            
                                SQLite and custom order by
                            
                                Query error with ambiguous column name in SQL
                            
                                INSERT INTO a temp table, and have an IDENTITY field created, without first declaring the temp table?
                            
                                SQL Join Differences
                            
                                Trying to get the average of a count resultset
                            
                                View SQL query in Slick
                            
                                SQL Server 2005 Setting a variable to the result of a select query
                            
                                SQL Not Like Statement not working
                            
                                If else on WHERE clause
                            
                                How to select the comparison of two columns as one column in Oracle
                            
                                rename database name in sql server management studio 2014

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trim trailing spaces with PostgreSQL

Tags:

sql

trim

postgresql

whitespace

removing-whitespace