invalid byte sequence for encoding "UTF8"

People also ask

What is invalid byte sequence?

Explanation: This error occurs when you send text data, but either the source encoding doesn't match that currently set on the database, or the text stream contains binary data like NUL bytes that are not allowed within a string.

What is utf8?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

Can UTF-8 contain null?

NULL is a valid UTF-8 character. If specific languages and their standard libraries choose to treat it as a string terminator (C, I'm looking at you), well, then fine. But it's still valid Unicode.

If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".

But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy utility has detected or guessed that you're feeding it a UTF8 file.

If you're running under some variant of Unix, you can check the encoding (more or less) with the file utility.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.

If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)

You can use the iconv utility to change encoding of the input data.

iconv -f original_charset -t utf-8 originalfile > newfile

You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".

psql=# copy tmp from '/path/to/file.csv' with delimiter ',' csv header encoding 'windows-1251';

Adding encoding option worked in my case.

If you are ok with discarding nonconvertible characters, you can use -c flag

iconv -c -t utf8 filename.csv > filename.utf8.csv

and then copy them to your table

Apparently I can just set the encoding on the fly,

 set client_encoding to 'latin1'

And then re-run the query. Not sure what encoding I should be using though.

latin1 made the characters legible, but most of the accented characters were in upper-case where they shouldn't have been. I assumed this was due to a bad encoding, but I think its actually the data that was just bad. I ended up keeping the latin1 encoding, but pre-processing the data and fixed the casing issues.

This error means that records encoding in the file is different with respect to the connection. In this case iconv may return the error, sometimes even despite //IGNORE flag:

iconv -f ASCII -t utf-8//IGNORE < b.txt > /a.txt

iconv: illegal input sequence at position (some number)

The trick is to find incorrect characters and replace it. To do it on Linux use "vim" editor:

vim (your text file), press "ESC": button and type ":goto (number returned by iconv)"

To find non ASCII characters you may use the following command:

grep --color='auto' -P "[\x80-\xFF]"

If you remove incorrect characters please check if you really need to convert your file: probably the problem is already solved.

Related questions
                            
                                Psql list all tables
                            
                                Rails 4 LIKE query - ActiveRecord adds quotes
                            
                                postgres default timezone
                            
                                MySQL vs PostgreSQL for Web Applications [closed]
                            
                                PostgreSQL array of elements that each are a foreign key
                            
                                How to restart Postgresql
                            
                                Padding zeros to the left in postgreSQL
                            
                                How to configure PostgreSQL to accept all incoming connections
                            
                                Postgres: "ERROR: cached plan must not change result type"
                            
                                Python/postgres/psycopg2: getting ID of row just inserted
                            
                                How do I delete a fixed number of rows with sorting in PostgreSQL?
                            
                                How to create function that returns nothing
                            
                                How to write DataFrame to postgres table?
                            
                                list Postgres ENUM type
                            
                                What is the difference between CHARACTER VARYING and VARCHAR in PostgreSQL?
                            
                                How to create index on JSON field in Postgres?
                            
                                Increment a value in Postgres
                            
                                How to extract year and month from date in PostgreSQL without using to_char() function?
                            
                                How to search a specific value in all tables (PostgreSQL)?
                            
                                PostgreSQL database default location on Linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

invalid byte sequence for encoding "UTF8"

Tags:

import

postgresql

People also ask

Recent Activity

Donate For Us