Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different ways to import files into HDFS

I want to know what are the different ways through which I can bring data into HDFS.

I am a newbie to Hadoop and was a java web developer till this time. I want to know if I have a web application that is creating log files, how can i import the log files into HDFS.

like image 505
Gaurav Avatar asked Sep 26 '15 06:09

Gaurav


People also ask

How do you import data from Rdbms to hdfs?

You create a single Sqoop import command that imports data from diverse data sources, such as a relational database, into HDFS. You enter the Sqoop import command on the command line of your cluster to import data from a data source into HDFS.

Which file are imported for Hadoop?

You can import the following file types from a Hadoop Distributed File System (HDFS): . avro, . csv, . json, .


1 Answers

There are lot's of ways on how you can ingest data into HDFS, let me try to illustrate them here:

  1. hdfs dfs -put - simple way to insert files from local file system to HDFS
  2. HDFS Java API
  3. Sqoop - for bringing data to/from databases
  4. Flume - streaming files, logs
  5. Kafka - distributed queue, mostly for near-real time stream processing
  6. Nifi - incubating project at Apache for moving data into HDFS without making lots of changes

Best solution for bringing web application logs to HDFS is through Flume.

like image 60
Ashrith Avatar answered Oct 19 '22 04:10

Ashrith