I have a table that has partitions and I use avro files or text files to create and insert into a table. Once the table is done, is there a way to convert into parquet. I mean I know we could have done say <code>CREATE TABLE default.test( name_id STRING) PARTITIONED BY ( year INT, month INT, day INT ) STORED AS PARQUET</code> initially while creating the table itself. In my use case I 'll have to use textfiles initially. This is because I want to avoid creating multiple files inside of partition folders everytime I insert or update. My table has a very high number of inserts and updates and this is creating a drop in performance. Is there a way I could convert into parquet after the table is created and data inserted?

You can create a table on your data in hdfs which can be stored as text, avro, or whatever format. Then you can create another table using: <pre class="prettyprint"><code>CREATE TABLE x_parquet LIKE x_non_parquet STORED AS PARQUET; </code></pre> You can then set compression to something like snappy or gzip: <pre class="prettyprint"><code>SET PARQUET_COMPRESSION_CODEC=snappy; </code></pre> Then you can get data from the non parquet table and insert it into the new parquet backed table: <pre class="prettyprint"><code>INSERT INTO x_parquet select * from x_non_parquet; </code></pre> Now if you want to save space and avoid confusion, I'd automate this for any data ingestion and then delete the original non parquet format. This will help your queries run faster and cause your data to take up less space.

Impala - convert existing table to parquet format

Tags:

text-files

avro

parquet

impala

I have a table that has partitions and I use avro files or text files to create and insert into a table.

Once the table is done, is there a way to convert into parquet. I mean I know we could have done say CREATE TABLE default.test( name_id STRING) PARTITIONED BY ( year INT, month INT, day INT ) STORED AS PARQUET initially while creating the table itself.
In my use case I 'll have to use textfiles initially. This is because I want to avoid creating multiple files inside of partition folders everytime I insert or update. My table has a very high number of inserts and updates and this is creating a drop in performance. Is there a way I could convert into parquet after the table is created and data inserted?

390

asked Oct 14 '14 16:10

user1189851

1 Answers

You can create a table on your data in hdfs which can be stored as text, avro, or whatever format.

Then you can create another table using:

CREATE TABLE x_parquet LIKE x_non_parquet STORED AS PARQUET;

You can then set compression to something like snappy or gzip:

SET PARQUET_COMPRESSION_CODEC=snappy;

Then you can get data from the non parquet table and insert it into the new parquet backed table:

INSERT INTO x_parquet select * from x_non_parquet;

Now if you want to save space and avoid confusion, I'd automate this for any data ingestion and then delete the original non parquet format. This will help your queries run faster and cause your data to take up less space.

answered Jan 01 '23 09:01

Ray

Related questions
                            
                                How to print the console to a text file AFTER the program finishes (Python)?
                            
                                Create, save, and read text file
                            
                                The correct way to populate a JComboBox?
                            
                                Unit Tests for comparing text files in NUnit
                            
                                Saving the items of a listbox to a text file
                            
                                When downloading a file from ASP .Net, the text file gets appended with HTML content
                            
                                (Java) writing events to log text file
                            
                                C# Removing separator characters from quoted strings
                            
                                Ghostscript convert a PDF and output in a textfile
                            
                                Extract value containing in a line from a text file using groovy
                            
                                How do I overwrite text in VB.NET
                            
                                Best way for read and write a text file
                            
                                Load .txt file from javascript with jquery [closed]
                            
                                How do I save a multi-line textbox as one line to a text file?
                            
                                C# - Load a text file as a class
                            
                                how to create .txt file and write it c# asp.net
                            
                                Open a .txt file into a richTextBox in C#
                            
                                Reading a series matrix properly in R
                            
                                What to choose to store just one integer? Sqlite? or Text file?
                            
                                Create and write in text file with SSIS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With