Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgresql(aws redshift) error 1204 String length exceeds DDL length

I am trying to import csv into aws redshift( postgresql 8.x) .

The data flow is: mysql -> parquet files on s3 -> csv files on s3 -> redshift.

Table structure

The mysql table sql:

create table orderitems
(
  id                         char(36) collate utf8_bin not null
    primary key,
  store_id                   char(36) collate utf8_bin not null,
  ref_type                   int                       not null,
  ref_id                     char(36) collate utf8_bin not null,
  store_product_id           char(36) collate utf8_bin not null,
  product_id                 char(36) collate utf8_bin not null,
  product_name               varchar(50)               null,
  main_image                 varchar(200)              null,
  price                      int                       not null,
  count                      int                       not null,
  logistics_type             int                       not null,
  time_create                bigint                    not null,
  time_update                bigint                    not null,
  ...
);

I used same sql to create table in redshift , but it got error while importing csv.

My code import csv to redshift (python)

# parquet is dumpy by sqoop
p2 = 'xxx'
df = pd.read_parquet(path)    

with smart_open.smart_open(p2, 'w') as f:
    df.to_csv(f, index=False)  # python3 default encoding is utf-8

conn = psycopg2.connect(CONN_STRING)

sql="""COPY %s FROM '%s' credentials 'aws_iam_role=%s' region 'cn-north-1' 
delimiter ',' FORMAT AS CSV IGNOREHEADER 1 ; commit ;""" %  (to_table, p2, AWS_IAM_ROLE)
print(sql)
cur = conn.cursor()
cur.execute(sql)
conn.close()

Got error

By checking STL_LOAD_ERRORS found error on product_name column enter image description here

  • row_field_value : .............................................215g/...
  • err_code: 1204
  • err_reason: String length exceeds DDL length

The real_value is 伊利畅轻蔓越莓奇亚籽风味发酵乳215g/瓶( chinese) .

So it looks like some encoding problem. Since mysql is utf-8 and the csv is utf-8 too , I don't know what is wrong .

like image 501
Mithril Avatar asked Dec 24 '22 01:12

Mithril


1 Answers

Your column is a varchar data type, with length 50. That's 50 bytes, not 50 characters. The string example you've given looks to be about 16 chinese characters, which are probably 3 bytes each (UTF-8) and four ASCII characters (one byte each), so about 52 bytes. That's longer than the byte length of the column, so the import fails.

like image 107
Nathan Griffiths Avatar answered Jan 14 '23 00:01

Nathan Griffiths