I have files in Mainframe. I want these data to be pushed to Hadoop(HDFS)/HIVE.
I can use Sqoop for the Mainframe DB2 database and import it to HIVE, but what about files (like COBOL
,VASM
etc.)
Is there any custom flume source that I can write or some alternative tool to use here?
Sqoop: Data Ingestion for Relational Databases While Flume works on unstructured or semi-structured data, Sqoop is used to export data from and import data into relational databases. As most enterprise data is stored in relational databases, Sqoop is used to import that data into Hadoop for analysts to examine.
There are two primary methods that can be used to move data into Hadoop: writing external data at the HDFS level (a data push), or reading external data at the MapReduce level (more like a pull). Reading data in MapReduce has advantages in the ease with which the operation can be parallelized and made fault tolerant.
Hadoop is not a type of database, but rather a software ecosystem that allows for massively parallel computing. It is an enabler of certain types NoSQL distributed databases (such as HBase), which can allow for data to be spread across thousands of servers with little reduction in performance.
COBOL is a programming language, not a file format. If what you need is to export files produced by COBOL programs, you can use the same technique as if those files were produced by C, C++, Java, Perl, PL/I, Rexx, etc.
In general, you will have three different data sources: flat files, VSAM files, and a DBMS such as DB2 or IMS.
DMBSs have export utilities to copy the data into flat files. Keep in mind that data in DB2 will likely be normalized and thus you likely need the contents of related tables in order to make sense of the data.
VSAM files can be exported to flat files via the IDCAMS utility.
I would strongly suggest you get the files into a text format before transferring them to another box with a different code page. Trying to deal with mixed text (which must have its code page translated) and binary (which must not have its code page translated but which likely must be converted from big endian to little endian) is harder than doing the conversion up front.
The conversion can likely be done via the SORT utility on the mainframe. Mainframe SORT utilities tend to have extensive data manipulation functions. There are other mechanisms you could use (other utilities, custom code written in the language of your choice, purchased packages) but this is what we tend to do in these circumstances.
Once you have your flat files converted such that all data is text, you can transfer them to your Hadoop boxes via FTP or SFTP or FTPS.
This isn't an exhaustive coverage of the topic, but it will get you started.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With