I am using Cloudera's version of Hive and trying to create an external table over a csv file that contains the column names in the first column. Here is the code that I am using to do that.
CREATE EXTERNAL TABLE Test ( RecordId int, FirstName string, LastName string ) ROW FORMAT serde 'com.bizo.hive.serde.csv.CSVSerde' WITH SerDeProperties ( "separatorChar" = "," ) STORED AS TEXTFILE LOCATION '/user/File.csv'
Sample Data
RecordId,FirstName,LastName 1,"John","Doe" 2,"Jane","Doe"
Can anyone help me with how to skip the first row or do I need to add an intermediate step?
Show activity on this post. create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' TBLPROPERTIES ("skip. header. line.
Below is the simple syntax to create Hive external tables: CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [ROW FORMAT row_format] [FIELDS TERMINATED BY char] [STORED AS file_format] [LOCATION hdfs_path];
As of Hive v0.13.0, you can use skip.header.line.count
table property:
create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' TBLPROPERTIES ("skip.header.line.count"="1");
Use ALTER TABLE
for an existing table:
ALTER TABLE tablename SET TBLPROPERTIES ("skip.header.line.count"="1");
Please note that while it works it comes with its own issues. When there is more than one output file generated i.e. reducers are greater than 1, it skips the first record for each and every file which might not necessarily be the desired behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With