Does Postgresql varchar count using unicode character length or ASCII character length?

Tags:

I tried importing a database dump from a SQL file and the insert failed when inserting the string Mér into a field defined as varying(3). I didn't capture the exact error, but it pointed to that specific value with the constraint of varying(3).

Given that I considered this unimportant to what I was doing at the time, I just changed the value to Mer, it worked, and I moved on.

Is a varying field with its limit taking into account length of the byte string? What really boggles my mind is that this was dumped from another PostgreSQL database. So it doesn't make sense how a constraint could allow the value to be written initially.

870

asked Nov 22 '10 20:11

bennylope

1 Answers

The length limit imposed by varchar(N) types and calculated by the length function is in characters, not bytes. So 'abcdef'::char(3) is truncated to 'abc' but 'a€cdef'::char(3) is truncated to 'a€c', even in the context of a database encoded as UTF-8, where 'a€c' is encoded using 5 bytes.

If restoring a dump file complained that 'Mér' would not go into a varchar(3) column, that suggests you were restoring a UTF-8 encoded dump file into a SQL_ASCII database.

For example, I did this in a UTF-8 database:

create schema so4249745; create table so4249745.t(key varchar(3) primary key); insert into so4249745.t values('Mér');

And then dumped this and tried to load it into a SQL_ASCII database:

pg_dump -f dump.sql --schema=so4249745 --table=t createdb -E SQL_ASCII -T template0 enctest psql -f dump.sql enctest

And sure enough:

psql:dump.sql:34: ERROR:  value too long for type character varying(3) CONTEXT:  COPY t, line 1, column key: "Mér"

By contrast, if I create the database enctest as encoding LATIN1 or UTF8, it loads fine.

This problem comes about because of a combination of dumping a database with a multi-byte character encoding, and trying to restore it into a SQL_ASCII database. Using SQL_ASCII basically disables the transcoding of client data to server data and assumes one byte per character, leaving it to the clients to take responsibility for using the right character map. Since the dump file contains the stored string as UTF-8, that is four bytes, so a SQL_ASCII database sees that as four characters, and therefore regards it as violating the constraint. And it prints out the value, which my terminal then reassembles as three characters.

answered Sep 28 '22 07:09

araqnid

Related questions
                            
                                Cannot SELECT from UPDATE RETURNING clause in postgres
                            
                                How to use Postgres' enumerated type with Ecto
                            
                                PostgreSQL: pg_dump: [archiver (db)] connection to database "dbase" failed: FATAL: Peer authentication failed for user "postgres"
                            
                                PostgreSQL, SELECT from max id
                            
                                How to backup/restore Rails db with Postgres?
                            
                                Postgres SQL Exclusive OR (XOR) CHECK CONSTRAINT, is it possible?
                            
                                PostgreSQL PL/pgSQL random value from array of values
                            
                                What happens when I exhaust a bigint generated key? How to handle it?
                            
                                PostGIS Error: type "geography" does not exist
                            
                                Generate random String in PostgreSQL
                            
                                AWS Postgres DB "does not exist" when connecting with PG
                            
                                Sequelize: Changing model schema on production
                            
                                Disable DELETE on table in PostgreSQL?
                            
                                How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?
                            
                                How to start Postgres server? [closed]
                            
                                select from values in mysql
                            
                                PGError: ERROR: current transaction is aborted
                            
                                Postgres Npgsql Connection Pooling
                            
                                Using "Cursors" for paging in PostgreSQL [duplicate]
                            
                                django.db.utils.OperationalError: fe_sendauth: no password supplied

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does Postgresql varchar count using unicode character length or ASCII character length?

Tags:

postgresql

unicode

bennylope

People also ask

1 Answers

araqnid

Recent Activity

Donate For Us