Give an HDFS path, how to figure out what format it is (text, sequence or parquet)?

I think it's not easy to accomplish your demand, unless all your files inside HDFS follow some conventions, e.g. <code>.txt</code> for text, <code>.seq</code> fro sequence and <code>.parquet</code> for parquet file. However, you could check your file manually using <code>cat</code>. <ul> <li> HDFS cat: <code>hadoop dfs -cat /path/to/file | head</code> to check if it's a text file. </li> <li> Parquet head: parquet-tools head [option...] /path/to/file </li> <li> or, write a program to read.... </li> </ul>

use "hdfs dfs -cat /path/to/file | head ", 1) for orc file, the command can print the "ORC" flag in the first line 2) for parquet file, the command can print the "PAR1" flag in the first line 3) for text file, the command can print the all the content of file

How to inspect the format of a file on HDFS?

3 Answers

I think it's not easy to accomplish your demand, unless all your files inside HDFS follow some conventions, e.g. .txt for text, .seq fro sequence and .parquet for parquet file.

However, you could check your file manually using cat.

HDFS cat: hadoop dfs -cat /path/to/file | head to check if it's a text file.
Parquet head: parquet-tools head [option...] /path/to/file
or, write a program to read....

142

answered Sep 20 '22 19:09

yjshen

use "hdfs dfs -cat /path/to/file | head ",

1) for orc file, the command can print the "ORC" flag in the first line

2) for parquet file, the command can print the "PAR1" flag in the first line

3) for text file, the command can print the all the content of file

answered Sep 23 '22 19:09

minisheep

String extension = FilenameUtils.getExtension("hdfs://path-to-file"); Working with Hadoop 2.5.2

answered Sep 20 '22 19:09

Iris Veriris

Related questions
                            
                                Searching over documents stored in Hadoop - which tool to use?
                            
                                Log viewing utility database choice
                            
                                Mahout: CSV to vector and running the program
                            
                                Do Mappers and Reducers in Hadoop have to be static classes?
                            
                                Propagating custom configuration values in Hadoop
                            
                                For Hive partition based on date, why use string type? why not int?
                            
                                How can I develop an ASP.NET web application using Hadoop as Database?
                            
                                Setting textinputformat.record.delimiter in spark
                            
                                Mapreduce: more reducers than mappers?
                            
                                group concat equivalent in pig?
                            
                                Moving file from local to HDFS
                            
                                Controling and monitorying number of simultaneous map/reduce tasks in YARN
                            
                                Trouble using hbase from java on Amazon EMR
                            
                                Need to add auto increment column in a table using hive
                            
                                How to change sqoop metastore?
                            
                                Exception while connecting to mongodb in spark
                            
                                Error Hadoop on Windows via Cygwin: Could Not Locate null\bin\winutils.exe
                            
                                Apache Giraph - Cannot run in split master / worker mode since there is only 1 task at a time
                            
                                Flag -useHCatalog not working
                            
                                How can get memory and CPU usage of hadoop yarn application?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to inspect the format of a file on HDFS?

Tags:

hadoop

hdfs

tomsheep

People also ask

3 Answers

yjshen

minisheep

Iris Veriris

Recent Activity

Donate For Us