Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I inspect a Hadoop SequenceFile for which I lack full schema information?

Tags:

apache

hadoop

I have a compressed Hadoop SequenceFile from a customer which I'd like to inspect. I do not have full schema information at this time (which I'm working on separately).

But in the interim (and in the hopes of a generic solution), what are my options for inspecting the file?

I found a tool forqlift: http://www.exmachinatech.net/01/forqlift/

And have tried 'forqlift list' on the file. It complains that it can't load classes for the custom subclass Writables included. So I will need to track down those implementations.

But is there any other option available in the meantime? I understand that most likely I can't extract the data, but is there some tool for scanning how many key values and of what type?

like image 286
Mike Repass Avatar asked Sep 26 '11 19:09

Mike Repass


1 Answers

From shell:

$ hdfs dfs -text /user/hive/warehouse/table_seq/000000_0

or directly from hive (which is much faster for small files, because it is running in an already started JVM)

hive> dfs -text /user/hive/warehouse/table_seq/000000_0

works for sequence files.

like image 78
slavo Avatar answered Sep 20 '22 02:09

slavo