Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aws data pipeline datetime variable

I am using AWS Data Pipeline to save a text file to my S3 bucket from RDS. I would like the file name to to have the date and the hour in the file name like:

myfile-YYYYMMDD-HH.txt
myfile-20140813-12.txt

I have specified my S3DataNode FilePath as:

s3://mybucketname/out/myfile-#{format(myDateTime,'YYYY-MM-dd-HH')}.txt

When I try to save my pipeline I get the following error:

ERROR: Unable to resolve myDateTime for object:DataNodeId_xOQxz

According to the AWS Data Pipeline documentation for date and time functions this is the proper syntax for using the format function.

When I save pipeline using a "hard-coded" the date and time I don't get this error and my file is in my S3 bucket and folder as expected.

My thinking is that I need to define "myDateTime" somewhere or use a NOW()

Can somebody tell me how to set "myDateTime" to the current time (e.g. NOW) or give a workaround so I can format the current time to be used in my FilePath?

like image 900
davedi Avatar asked Aug 13 '14 17:08

davedi


2 Answers

I am not aware of an exact equivalent of NOW() in Data Pipeline. I tried using makeDate with no arguments (just for fun) to see if that worked.. it did not.

The closest are runtime variables scheduledStartTime, actualStartTime, reportProgressTime.

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-s3datanode.html

The following for eg. should work. s3://mybucketname/out/myfile-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.txt

like image 166
user1452132 Avatar answered Sep 28 '22 10:09

user1452132


Just for fun, here is some more info on Parameters.

At the end of your Pipeline Json (click List Pipelines, select into one, click Edit Pipeline, then click Export), you need to add a Parameters and/or Values object.

I use a myStartDate for backfill processes which you can manipulate once it is passed in for ad hoc runs. You can give this a static default, but can't set it to a dynamic value so it is limited for regular schedule tasks. For realtime/scheduled dates, you need to use the @scheduledStartTime, etc, as suggested. Here is a sample of setting up some Parameters and or Values. Both show up in Parameters in the UI. These values can be used through out your pipeline activities (shell, hive, etc) with the #{myVariableToUse} notation.

"parameters": [
{
  "helpText": "Put help text here",
  "watermark": "This shows if no default or value set",
  "description": "Label/Desc",
  "id": "myVariableToUse",
  "type": "string"
}
]

And for Values:

"values": {
  "myS3OutLocation": "s3://some-bucket/path",
  "myThreshold": "30000",
}

You cannot add these directly in the UI (yet) but once they are there you can change and save the values.

like image 37
williambq Avatar answered Sep 28 '22 12:09

williambq