Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysterious error: invalid byte sequence for encoding "UTF8"

I'm stuck trying to find a cause for

invalid byte sequence for encoding "UTF8".

It is an C program using libpq. I'm using PQexecParams to execute the SQL query.

The offending byte sequence is completely random, sometimes the command even runs ok. I thought I must have a memory allocation issue somewhere, but even I specify all the parameters as static strings, I still receive the error with a random byte sequence. What's more, the same query with the same parameters runs ok when I create a small test program. It even runs ok from other places in the applications. So I'm completely stuck. I verified all the possible sources for the error like client_encoding etc, but could not find the source of the error. What is confusing me is that the offending byte sequence is random, even though the query parameters don't change. Moreover, when I check the postgres log, the query and its parameters appear to be correct.

I'm trying to update a record in the following table:

CREATE TABLE public.contacts
(
    contactid integer NOT NULL DEFAULT nextval('contacts_contactid_seq'::regclass),
    paperid integer,
    pos character varying(50) COLLATE pg_catalog."default",
    title character varying(10) COLLATE pg_catalog."default",
    firstname character varying(20) COLLATE pg_catalog."default",
    lastname character varying(25) COLLATE pg_catalog."default",
    func character varying(25) COLLATE pg_catalog."default",
    tel1 text COLLATE pg_catalog."default",
    tel2 text COLLATE pg_catalog."default",
    fax1 text COLLATE pg_catalog."default",
    fax2 text COLLATE pg_catalog."default",
    email1 character varying(50) COLLATE pg_catalog."default",
    email2 character varying(50) COLLATE pg_catalog."default",
    maincontact boolean DEFAULT false,
    publdatacontact boolean DEFAULT false,
    invcontact boolean DEFAULT false,
    queries_recipient boolean,
    contact_log text COLLATE pg_catalog."default",
    salesforceid character(18) COLLATE pg_catalog."default",
    fakelastname boolean NOT NULL DEFAULT false,
    CONSTRAINT contacts_pk PRIMARY KEY (contactid),
    CONSTRAINT contacts_paperid_fkey FOREIGN KEY (paperid)
        REFERENCES public.papers (paperid) MATCH SIMPLE
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

Here is an actual code:

    const char* pparams[16] = {
NULL,
NULL,
"1702",
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
"14340"
    };



gchar *query="UPDATE contacts SET Pos=$1::varchar,Title=$2::varchar,PaperID=$3::int,FirstName=$4::varchar,LastName=$5::varchar,Func=$6::varchar,Tel1=$7::text,Fax1=$8::text,Email1=$9::varchar,Tel2=$10::text,Fax2=$11::text,Email2=$12::varchar,MainContact=$13::boolean,PublDataContact=$14::boolean,InvContact=$15::boolean WHERE ContactID=$16::int";

      result = PQexecParams(conn, query, 16, NULL, pparams, ssizes, bbinary, 0);

An excerpt from Postgres log:

Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>: 

    UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar, PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func = $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar, Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact = $13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean WHERE ContactID = $16::int
    Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2] 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL:  parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6 = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 = NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
    Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1] 2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR:  invalid byte sequence for encoding "UTF8": 0x80

Any ideas as to what could be a cause of the error?

like image 694
jira Avatar asked Jan 27 '21 08:01

jira


1 Answers

first of all you are using postgres, and when you create tables in postgres with type character varying you dont have to specify the length. that is what varying stands for. it will consume as many bytes as it needs

very possible that the strings you put in your db are encoded in such way ex. double byte characters, so when you are trying to insert a 26 bytes string into a 25 length character column the last byte is not valid utf8

so i suggest you recreate your table ommiting all lengths on character varying columns and try again.

then check your system locale and the db's locale, I would suggest you create your db using template0 and adding a usable from your system local depending on your language.UTF-8

then check you code file encoding as well to be utf8 using file

if nothing works let me know

I have tested it with the create table you posted but without the foreign keys and with the following code

int main() {
    const char conninfo[] = "postgresql://postgres@localhost?port=5432&dbname=libpq_demo";
    PGconn *conn = PQconnectdb(conninfo);
    if (PQstatus(conn) != CONNECTION_OK) {
        printf("Connection to database failed: %s",  PQerrorMessage(conn));
        PQfinish(conn);
        return 1;
    }
    else {
        printf("%s", "Connection to database succeed.\n");
    }

    const char* pparams[16] = {
        NULL,
        NULL,
        "1702",
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        NULL,
        "14340"
    };

    int ssizes[16] = {
        sizeof(NULL),
        sizeof(NULL),
        4,
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        sizeof(NULL),
        5
    };

    int bbinary[16]= {
        1,
        1,
        0,
        1,
        1,
        1,
        1,
        1,
        1,
        1,
        1,
        1,
        1,
        1,
        1,
        0
    };

    gchar *query="UPDATE contacts SET Pos=$1::varchar,Title=$2::varchar,PaperID=$3::int,FirstName=$4::varchar,LastName=$5::varchar,Func=$6::varchar,Tel1=$7::text,Fax1=$8::text,Email1=$9::varchar,Tel2=$10::text,Fax2=$11::text,Email2=$12::varchar,MainContact=$13::boolean,PublDataContact=$14::boolean,InvContact=$15::boolean WHERE ContactID=$16::int";

    PQexecParams(conn, query, 16, NULL, pparams, ssizes, bbinary, 0);
}

and compiled it with

gcc foo.cc -o foo-demo -I/usr/include/postgresql -I/usr/include/glib-2.0 -lpq

apart from the warning about the gchar, which I am not sure why you use it but anyway, everything works perfect. I have tested it about 10K times

you should consider looking at the

CONSTRAINT contacts_paperid_fkey FOREIGN KEY (paperid)
REFERENCES public.papers (paperid) MATCH SIMPLE

maybe it has nothing to do the code, but with the fact that you are passing a value there which is conflicting

like image 173
Tch Avatar answered Nov 17 '22 09:11

Tch