Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading from one Hadoop cluster and writing to another Hadoop custer

I am running a spark job and I need to read from a HDFS table which is in lets say HadoopCluster-1. Now I want the aggregate dataframe into a table which is present in another HadoopCluster-2. What would be the best way to do it?

  1. I am thinking of below approach: Before writing the data to target table, read the hdfs-site.xml and core-site.xml using addResource. Then copy all the config values into a Map<String,String> Then set these conf values into my dataset.sparkSession.SparkContext.hadoopConfiguration().

Is this a good way to achieve my goal ?

like image 821
GearFour Avatar asked Oct 16 '25 05:10

GearFour


1 Answers

If you want to read hive table from cluster1 as a dataframe and write it as hive table in cluster2 after transforming dataframe, you can try below approach.

  1. Make sure hiveserver2 is running on both cluster. command to run server is

hive --service hiveserever2

hive --service metastore

  1. Make sure hive is properly configured with username/password. You can mark both username/password as empty but you will get an error, you can resolve that by referring this link.

  2. Now read hive table from cluster1 as spark dataframe and write it to hive table of cluster2 after transformation.

    // spark-scala code
    
    val sourceJdbcMap = Map(
     "url"->"jdbc:hive2://<source_host>:<port>", //default port is 10000
     "driver"->"org.apache.hive.jdbc.HiveDriver",
     "user"->"<username>",
     "password"->"<password>",
     "dbtable"->"<source_table>")
    
    val targetJdbcMap = Map(
     "url"->"jdbc:hive2://<target_host>:<port>", //default port is 10000
     "driver"->"org.apache.hive.jdbc.HiveDriver",
     "user"->"<username>",
     "password"->"<password>",
     "dbtable"->"<target_table>")
    
    val sourceDF = spark.read.format("jdbc").options(sourceJdbcMap).load()
    
    val transformedDF = //transformation goes here...
    
    transformedDF.write.options(targetJdbcMap).format("jdbc").save()
    
like image 109
Mohana B C Avatar answered Oct 17 '25 20:10

Mohana B C



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!