Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to inspect the format of a file on HDFS?

Tags:

hadoop

hdfs

Give an HDFS path, how to figure out what format it is (text, sequence or parquet)?

like image 807
tomsheep Avatar asked May 18 '15 03:05

tomsheep


People also ask

What is the HDFS command to see the content of the file?

Your answer You can use the Hadoop filesystem command to read any file. It supports the cat command to read the content.

What is the command to see HDFS contents in human-readable format?

Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u] <args> Options: -d: Directories are listed as plain files. -h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864). -R: Recursively list subdirectories encountered. -t: Sort output by modification time (most recent first).

What format is data stored in Hadoop?

Standard Hadoop Storage File Formats Some standard file formats are text files (CSV,XML) or binary files(images). Text Data - These data come in the form of CSV or unstructured data such as twitters. CSV files commonly used for exchanging data between Hadoop and external systems.


3 Answers

I think it's not easy to accomplish your demand, unless all your files inside HDFS follow some conventions, e.g. .txt for text, .seq fro sequence and .parquet for parquet file.

However, you could check your file manually using cat.

  • HDFS cat: hadoop dfs -cat /path/to/file | head to check if it's a text file.

  • Parquet head: parquet-tools head [option...] /path/to/file

  • or, write a program to read....

like image 142
yjshen Avatar answered Sep 20 '22 19:09

yjshen


use "hdfs dfs -cat /path/to/file | head ",

1) for orc file, the command can print the "ORC" flag in the first line

2) for parquet file, the command can print the "PAR1" flag in the first line

3) for text file, the command can print the all the content of file

like image 43
minisheep Avatar answered Sep 23 '22 19:09

minisheep


String extension = FilenameUtils.getExtension("hdfs://path-to-file"); Working with Hadoop 2.5.2

like image 44
Iris Veriris Avatar answered Sep 20 '22 19:09

Iris Veriris