How to skip CSV header in Hive External Table?

Tags:

hive

I am using Cloudera's version of Hive and trying to create an external table over a csv file that contains the column names in the first column. Here is the code that I am using to do that.

CREATE EXTERNAL TABLE Test (    RecordId int,    FirstName string,    LastName string  )  ROW FORMAT serde 'com.bizo.hive.serde.csv.CSVSerde'  WITH SerDeProperties (     "separatorChar" = "," )  STORED AS TEXTFILE  LOCATION '/user/File.csv'

Sample Data

RecordId,FirstName,LastName 1,"John","Doe" 2,"Jane","Doe"

Can anyone help me with how to skip the first row or do I need to add an intermediate step?

210

asked Apr 01 '13 21:04

1 Answers

As of Hive v0.13.0, you can use skip.header.line.count table property:

create external table testtable (name string, message string) row format delimited  fields terminated by '\t'  lines terminated by '\n'  location '/testtable' TBLPROPERTIES ("skip.header.line.count"="1");

Use ALTER TABLE for an existing table:

ALTER TABLE tablename SET TBLPROPERTIES ("skip.header.line.count"="1");

Please note that while it works it comes with its own issues. When there is more than one output file generated i.e. reducers are greater than 1, it skips the first record for each and every file which might not necessarily be the desired behaviour.

144

answered Sep 22 '22 16:09

5 revs, 4 users 38%

Related questions
                            
                                SparkSQL vs Hive on Spark - Difference and pros and cons?
                            
                                COLLECT_SET() in Hive, keep duplicates?
                            
                                java.net.URISyntaxException when starting HIVE
                            
                                Explode the Array of Struct in Hive
                            
                                Create hive table using "as select" or "like" and also specify delimiter
                            
                                Hive installation issues: Hive metastore database is not initialized
                            
                                Hive query output to file
                            
                                Hadoop/Hive : Loading data from .csv on a local machine
                            
                                What is the difference between Apache Spark SQLContext vs HiveContext?
                            
                                java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
                            
                                Hive load CSV with commas in quoted fields
                            
                                How to Access Hive via Python?
                            
                                Hive: Convert String to Integer
                            
                                How to load data to hive from HDFS without removing the source file?
                            
                                Just get column names from hive table
                            
                                What's the difference between -DskipTests and -Dmaven.test.skip=true
                            
                                Does Hive have a String split function?
                            
                                How does Hive compare to HBase?
                            
                                PySpark: withColumn() with two conditions and three outcomes
                            
                                How does impala provide faster query response compared to hive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to skip CSV header in Hive External Table?

Tags:

hive

Rick Gittins

People also ask

1 Answers

5 revs, 4 users 38%

Recent Activity

Donate For Us