I have a CSV data with each field surronded with double quotes. When I created Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde' When above table is queried in Impala I am getting error SerDe not found.
I added the CSV Serde JAR file in /usr/lib/impala/lib folder.
Later studied in Impala documentation that Impala does not support custom SERDE. In such case how I can overcome this issue such that my CSV data with quotes is taken care. I want to use CSV Serde because it takes of commas in values which is a legitimate field vavlue.
Thanks a lot
Can you use Hive?
If so, here is an approach that might work. CREATE
your table as an EXTERNAL TABLE
in Hive and use your SERDE
in the right place of the CREATE Statement (I think you need something like ROW FORMAT SERDE your_serde_here at the end of the CREATE TABLE statement). Before this you might need to do:
ADD JAR 'hdfs:///path/to/your_serde.jar'
Note that the jar should be somewhere in hdfs and triple /// needed for it to work...
Then, still in Hive, duplicate the table into another table that is stored in a format with which Impala can easily work, such as PARQUET. Something like the following does this copying:
CREATE TABLE copy_of_table
STORED AS PARQUET AS
SELECT * FROM your_original_table
Now in Impala use INVALIDATE METADATA to mark the metadata as stale:
INVALIDATE METADATA copy_of_table
You should be all set to happily work with copy_of_table in Impala now.
Let me know whether this works, as I might have do to something like this in the near future.
Within Hive
CREATE TABLE mydb.my_serde_table_impala AS SELECT FROM mydb.my_serde_table
Within Impala
INVALIDATE METADATA mydb.my_serde_table_impala
Add these steps to include dropping the _impala table first with whatever populates or ingests files for the serde table.
Impala bypasses MapReduce, unlike Hive. So Impala can't/doesn't use the SerDe the way MapReduce does.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With