Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automating Hive Activity using aws

I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving this input and output in hive-script that's where the problem starts because a hive-activity has to have input and output but i have to give them in script file.

I am trying to find a way to automate this hive-script and waiting for some ideas ?

Cheers,

like image 364
Ducaz035 Avatar asked Jun 16 '26 07:06

Ducaz035


1 Answers

You can disable staging on Hive Activity to run any arbitrary Hive Script.

stage = false

Do something like:

{
  "name": "DefaultActivity1",
  "id": "ActivityId_1",
  "type": "HiveActivity",
  "stage": "false",
  "scriptUri": "s3://baucket/query.hql",
  "scriptVariable": [
    "param1=value1",
    "param2=value2"
  ],
  "schedule": {
    "ref": "ScheduleId_l"
  },
  "runsOn": {
    "ref": "EmrClusterId_1"
  }
},
like image 106
panther Avatar answered Jun 19 '26 02:06

panther