I'm new to Hadoop
and wondering how many types of InputFormat
are there in Hadoop
such as TextInputFormat
? Is there a certain InputFormat
that I can use to read files via http requests to remote data servers?
Thanks :)
There are many classes
implementing InputFormat
CombineFileInputFormat, CombineSequenceFileInputFormat,
CombineTextInputFormat, CompositeInputFormat, DBInputFormat,
FileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat,
MultiFileInputFormat, NLineInputFormat, Parser.Node,
SequenceFileAsBinaryInputFormat, SequenceFileAsTextInputFormat,
SequenceFileInputFilter, SequenceFileInputFormat, TextInputFormat
Have a look at this article on when to use which type of Inputformat
.
Out of these, most frequently used formats
are:
FileInputFormat
: Base class for all file-based InputFormatsKeyValueTextInputFormat
: An InputFormat for plain text files. Files are broken into lines. Either line feed or carriage-return are used to signal end of line. Each line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty.TextInputFormat
: An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..NLineInputFormat
: NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.SequenceFileInputFormat
: An InputFormat for SequenceFiles.Regarding second query, get the files from remote servers
first and use appropriate InputFileFormat
depending on contents in file. Hadoop
works best for data locality.
Your first question - how many types of InputFormat are there in Hadoop such as TextInputFormat?
TextInputFormat
- each line will be treated as valueKeyValueTextInputFormat
- First value before delimiter is key and rest is valueFixedLengthInputFormat
- Each fixed length value is considered to be valueNLineInputFormat
- N number of lines is considered one value/recordSequenceFileInputFormat
- For binaryAlso there is DBInputFormat
to read from databases
You second question - there is no input format to read files via http requests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With