I'm new to Hadoop and wondering how many types of InputFormat are there in Hadoop such as TextInputFormat? Is there a certain InputFormat that I can use to read files via http requests to remote data servers?
Thanks :)
There are many classes implementing InputFormat
CombineFileInputFormat, CombineSequenceFileInputFormat,
CombineTextInputFormat, CompositeInputFormat, DBInputFormat,
FileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat,
MultiFileInputFormat, NLineInputFormat, Parser.Node,
SequenceFileAsBinaryInputFormat, SequenceFileAsTextInputFormat,
SequenceFileInputFilter, SequenceFileInputFormat, TextInputFormat
Have a look at this article on when to use which type of Inputformat.
Out of these, most frequently used formats are:
FileInputFormat : Base class for all file-based InputFormatsKeyValueTextInputFormat : An InputFormat for plain text files. Files are broken into lines. Either line feed or carriage-return are used to signal end of line. Each line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty.TextInputFormat : An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..NLineInputFormat : NLineInputFormat which splits N lines of input as one split. In many "pleasantly" parallel applications, each process/mapper processes the same input file (s), but with computations are controlled by different parameters.SequenceFileInputFormat : An InputFormat for SequenceFiles.Regarding second query, get the files from remote servers first and use appropriate InputFileFormat depending on contents in file. Hadoop works best for data locality.
Your first question - how many types of InputFormat are there in Hadoop such as TextInputFormat?
TextInputFormat - each line will be treated as valueKeyValueTextInputFormat - First value before delimiter is key and rest is valueFixedLengthInputFormat - Each fixed length value is considered to be valueNLineInputFormat - N number of lines is considered one value/recordSequenceFileInputFormat - For binaryAlso there is DBInputFormat to read from databases
You second question - there is no input format to read files via http requests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With