What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive?

Tags:

Im new to Bigdata and currently learning Hive. I understood the concept of InputFormat & OutputFormat in Hive as part of SerDe. I also understood that 'Stored as' is used to store a file in a particular format just like InputFormat. But I don't understand what is the significant difference between using the 'InputFormat, OutputFormat' & 'Stored as'.

Any help is appreciated.

611

asked Feb 23 '17 12:02

Metadata

1 Answers

Hive has a lot of options of how to store the data. You can either use external storage where Hive would just wrap some data from other place or you can create standalone table from start in hive warehouse. Input and Output formats allows you to specify the original data structure of these two types of tables or how the data will be physically stored. From your client side you will keep working with a table using sql, but on the low level it would be either text file or sequence file or hbase table or some other data structure.

InputFormat and OutputFormat - allows you to describe you the original data structure so that Hive could properly map it to the table view

SerDe - represents the class which performs actual translation of data from table view to the low level input-output format structures and opposite

Generally your process would be like this: HDFS files --> InputFileFormat --> Deserializer --> Row object --> Serializer --> OutputFileFormat --> HDFS files

Stored as - specifies such storage format which includes Input and Output formats for you new tables in Hive

These attributes can really affect the performance, the overall size, data schema evolution support or enable such features as ACID. You can follow the steps described in this article to see things are working on the low level and to get some general information about most commonly used formats - https://oyermolenko.blog/2017/02/16/structuring-hadoop-data-through-hive-and-sql

200

answered Sep 29 '22 13:09

Alex

Related questions
                            
                                How to implement sort in hadoop?
                            
                                ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
                            
                                Hadoop combiner sort phase
                            
                                Hadoop Documentation for Eclipse
                            
                                Where is the classpath set for hadoop
                            
                                Error in Hadoop MapReduce
                            
                                Most efficient way to create a path in zookeeper where root elements of the path may or may not exist?
                            
                                How to process a range of hbase rows using spark?
                            
                                output/echo a meesage in hql/ hive query language
                            
                                Difference between a row-oriented and column-oriented databases in dealing information retrieval
                            
                                Spark - How to count number of records by key
                            
                                avro gradle plugin sample usage
                            
                                Custom Map Reduce Program on Hive, what's the Rule? How about input and output?
                            
                                How do I pass a parameter to a python Hadoop streaming job?
                            
                                Hadoop Configuration on Windows through Cygwin
                            
                                namespace image and edit log
                            
                                hadoop hdfs formatting gets error failed for Block pool
                            
                                Get the last updated file in HDFS
                            
                                What is version library spark supported SparkSession
                            
                                How to recursively read Hadoop files from directory using Spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive?

Tags:

hadoop

hive

hiveql

hive-serde

Metadata

People also ask

1 Answers

Alex

Recent Activity

Donate For Us