Hi actually the problem is as follows the data i want to insert in hive table has latin words and its in utf-8 encoded format. But still hive does not display it properly.
Actual Data:-
Data Inserted in hive
I changed the encoding of the table to utf-8 as well still same issue below are the hive DDL and commands
CREATE TABLE IF NOT EXISTS test6
(
CONTACT_RECORD_ID string,
ACCOUNT string,
CUST string,
NUMBER string,
NUMBER1 string,
NUMBER2 string,
NUMBER3 string,
NUMBER4 string,
NUMBER5 string,
NUMBER6 string,
NUMBER7 string,
LIST string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|';
ALTER TABLE test6 SET serdeproperties ('serialization.encoding'='UTF-8');
Does hive support only the first 128 characters of UTF-8? Please do suggest.
Each UTF uses a different code unit size. For example, UTF-8 is based on 8-bit code units. Therefore, each character can be 8 bits (1 byte), 16 bits (2 bytes), 24 bits (3 bytes), or 32 bits (4 bytes).
The data loaded in the hive database is stored at the HDFS path – /user/hive/warehouse. If the location is not specified, by default all metadata gets stored in this path. In the HDFS path, the data is stored in blocks of size either 64 or 128 MB.
this may not be ideal solution , but this works. Hive somehow doesn't seem to treat them as UTF8. Please try to create the table with following parameters:
CREATE TABLE testjoins.yt_sample_mapping_1(
`col1` string,
`col2` string,
`col3` string)
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
WITH SERDEPROPERTIES ( "separatorChar" = ",",
"quoteChar" = "\"",
"escapeChar" = "\\",
"serialization.encoding"='ISO-8859-1')
TBLPROPERTIES ( 'store.charset'='ISO-8859-1',
'retrieve.charset'='ISO-8859-1');
For me adding following line worked.
TBLPROPERTIES('serialization.encoding'='windows-1252')
Example code:
CREATE EXTERNAL TABLE IF NOT EXISTS test.tbl
(
name string,
gender string,
age string,
address string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n' STORED AS TEXTFILE
LOCATION 'adl://<Data-Lake-Store>.azuredatalakestore.net/<Folder-Name>/'
TBLPROPERTIES('serialization.encoding'='windows-1252');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With