Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapreduce error with parquet format

I'm trying to run mapreduce job. My files are in a parquet format.

I'm getting the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/thrift/TException
at parquet.format.converter.ParquetMetadateConverter.readParquetMetadata(ParquetMetadateConverter.java:268)
at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:271)
at parquet.hadoop.ParquetFileReader.readSummeryFile(ParquetFileReader.java:200)
at parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummeryFiles(ParquetFileReader.java:99)
at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:354)
at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:339)
at parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:246)
...

I tried to add the jar that contains the TException with --libjars my_path/libthrift-0.9.0.jar and I still get the same error.

like image 299
crazybob Avatar asked May 20 '26 04:05

crazybob


1 Answers

Please try setting the HADOOP_CLASSPATH parameter to point to a libthrift.jar file that matches the version you need.

For example:

export HADOOP_CLASSPATH=/var/lib/hdfs/libthrift-0.9.jar

Hope this helps!

like image 139
xinec Avatar answered May 21 '26 18:05

xinec