Is there any way to know the stored hive tables delimiter? I tried Describe extended but no use.. I have searched a lot, not yet getting the answer.
To handle the delimiters within the data, you can create a table as "ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';" which will handle the data “1,some text\, with comma in it,123,more text”. @Sindhu: Thanks.
What are the default record and field delimiter used for hive text files? The default record delimiter is − \n And the filed delimiters are − \001,\002,\003 What do you mean by schema on read?
Introduced in HIVE-5871, MultiDelimitSerDe allows user to specify multiple-character string as the field delimiter when creating a table.
'\u0001' is a single character Ctrl-A. What do you have in your data file as the delimiter? A single ctrl-A or 6 characters '\u0001'? The delimiter in Hive must be a single character, and actually Ctrl-A is the default.
Other answers are correct in the sense that you would get the field delimiter if it is other than default. However, I don't see it if the delimiter is the default one, which is Control-A character or "\01" in ASCII
Try running a "show create table" command and it will show you the delimiter.
I am seeing by using describe extended table
command
example:
hive> create table difdelimiter (id int, name string)
row format delimited
fields terminated by ',';
hive> describe extended difdelimiter;
OK id int
name string
Detailed Table Information Table(tableName:difdelimiter, dbName:default, owner:cloudera, createTime:1439375349, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/difdelimiter, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1439375349}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.154 seconds, Fetched: 4 row(s)
Here is the information about delimiter
parameters:{serialization.format=,, field.delim=,}
Adding as per the comments
hive> create table tb3 (id int, name string) row format delimited fields terminated by '/t';
OK
Time taken: 0.09 seconds
hive> describe extended tb3;
OK
id int
name string
Detailed Table Information Table(tableName:tb3, dbName:default, owner:cloudera, createTime:1439377591, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/tb3, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=/t, field.delim=/t}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1439377591}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.125 seconds, Fetched: 4 row(s)
parameters:{serialization.format=/t, field.delim=/t})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With