Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download file weekly from FTP to HDFS

I want to automate the weekly download of a file from an ftp server into a CDH5 hadoop cluster. What would be the best way to do this?

I was thinking about an Oozie coordinator job but I can't think of a good method to download the file.

like image 707
JochenDB Avatar asked Oct 02 '22 07:10

JochenDB


2 Answers

Since you're using CDH5, it's worth noting that the NFSv3 interface to HDFS is included in that Hadoop distribution. You should check for "Configuring an NFSv3 Gateway" in the CDH5 Installation Guide documentation.

Once that's done, you could use wget, curl, python, etc. to put the file onto the NFS mount. You probably want to do this through Oozie ... go into the job Designer and create a copy of the "Shell" command. Put in the command that you've selected to do the data transfer (python script, curl, ftp, etc), and parameterize the job using ${myVar}.

It's not perfect, but I think it's fairly elegant.

like image 137
JamCon Avatar answered Oct 18 '22 11:10

JamCon


I suppose you want to pull a file.

One simple solution is that you can use coordinator which runs a workflow.

Workflow should have shell action

http://oozie.apache.org/docs/3.3.0/DG_ShellActionExtension.html

The script in that can just have

wget http://myftp.com/file.name

You can do all what you want in script

like image 23
user2230605 Avatar answered Oct 18 '22 13:10

user2230605