I used this Hive query to export a table into a CSV file.
INSERT OVERWRITE DIRECTORY '/user/data/output/test' select column1, column2 from table1;
The file generated '000000_0' does not have comma separator
Is this the right way to generate CSV file? If no, please let me know how can I generate the CSV file?
To directly save the file in HDFS, use the below command: hive> insert overwrite directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100; This will put the contents in the folder /user/cloudera/Sample in HDFS. Show activity on this post.
or use this
hive -e 'select * from your_Table' | sed 's/[\t]/,/g' > /home/yourfile.csv
You can also specify property set hive.cli.print.header=true
before the SELECT
to ensure that header along with data is created and copied to file. For example:
hive -e 'set hive.cli.print.header=true; select * from your_Table' | sed 's/[\t]/,/g' > /home/yourfile.csv
If you don't want to write to local file system, pipe the output of sed
command back into HDFS
using the hadoop fs -put
command.
It may also be convenient to SFTP to your files using something like Cyberduck, or you can use scp
to connect via terminal / command prompt.
If you're using Hive 11 or better you can use the INSERT
statement with the LOCAL
keyword.
Example:
insert overwrite local directory '/home/carter/staging' row format delimited fields terminated by ',' select * from hugetable;
Note that this may create multiple files and you may want to concatenate them on the client side after it's done exporting.
Using this approach means you don't need to worry about the format of the source tables, can export based on arbitrary SQL query, and can select your own delimiters and output formats.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With