we would like to put the results of a Hive query to a CSV file. I thought the command should look like this: <pre class="prettyprint"><code>insert overwrite directory '/home/output.csv' select books from table; </code></pre> When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

Although it is possible to use <code>INSERT OVERWRITE</code> to get data out of Hive, it might not be the best method for your particular case. First let me explain what <code>INSERT OVERWRITE</code> does, then I'll describe the method I use to get tsv files from Hive tables. According to the manual, your query will store the data in a directory in HDFS. The format will not be csv. <blockquote> Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format. </blockquote> A slight modification (adding the <code>LOCAL</code> keyword) will store the data in a local directory. <pre class="prettyprint"><code>INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table; </code></pre> When I run a similar query, here's what the output looks like. <pre class="prettyprint"><code>[lvermeer@hadoop temp]$ ll total 4 -rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0 [lvermeer@hadoop temp]$ head 000000_0 "row1""col1"1234"col3"1234FALSE "row2""col1"5678"col3"5678TRUE </code></pre> Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so: <pre class="prettyprint"><code>hive -e 'select books from table' > /home/lvermeer/temp.tsv </code></pre> That gives me a tab-separated file that I can use. Hope that is useful for you as well. Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following. <pre class="prettyprint"><code>INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select books from table; </code></pre> Hope that helps.

How do I output the results of a HiveQL query to CSV?

Tags:

database

hadoop

hive

hiveql

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory '/home/output.csv' select books from table;

When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?

263

asked Aug 08 '13 15:08

AAA

1 Answers

Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.

According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

A slight modification (adding the LOCAL keyword) will store the data in a local directory.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;

When I run a similar query, here's what the output looks like.

[lvermeer@hadoop temp]$ ll total 4 -rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0 [lvermeer@hadoop temp]$ head 000000_0  "row1""col1"1234"col3"1234FALSE "row2""col1"5678"col3"5678TRUE

Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

hive -e 'select books from table' > /home/lvermeer/temp.tsv

That gives me a tab-separated file that I can use. Hope that is useful for you as well.

Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'  ROW FORMAT DELIMITED  FIELDS TERMINATED BY ','  select books from table;

Hope that helps.

answered Oct 09 '22 13:10

Lukas Vermeer

Related questions
                            
                                Creating new database from a backup of another Database on the same server?
                            
                                Check if an object exists
                            
                                Where can I find historical raw weather data? [closed]
                            
                                Are Parameters really enough to prevent Sql injections?
                            
                                How do two-phase commits prevent last-second failure?
                            
                                Bulk-deleting in LINQ to Entities
                            
                                IntelliJ IDEA 10 generate entity (POJO) from DB model
                            
                                Just what is 'A big database'? [closed]
                            
                                How to change the output folder for migrations with asp.net Core?
                            
                                Doctrine 2: Update query with query builder
                            
                                CodeIgniter - return only one row?
                            
                                Why are database features being ignored, and instead reinvented in the middle tier?
                            
                                Mocking database in node.js?
                            
                                How do I use a dictionary to update fields in Django models?
                            
                                Installing and Running MongoDB on OSX
                            
                                How to simulate a DB for testing (Java)?
                            
                                Minimum GRANTs needed by mysqldump for dumping a full schema? (TRIGGERs are missing!!)
                            
                                Save child objects automatically using JPA Hibernate
                            
                                How to open and convert sqlite database to pandas dataframe
                            
                                Why can you not have a foreign key in a polymorphic association?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With