Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve well formatted JSON from AWS Lambda using Python

I have a function in AWS Lambda that connects to the Twitter API and returns the tweets which match a specific search query I provided via the event. A simplified version of the function is below. There's a few helper functions I use like get_secret to manage API keys and process_tweet which limits what data gets sent back and does things like convert the created at date to a string. The net result is that I should get back a list of dictionaries.

def lambda_handler(event, context):
    twitter_secret = get_secret("twitter")

    auth = tweepy.OAuthHandler(twitter_secret['api-key'],
                               twitter_secret['api-secret'])
    auth.set_access_token(twitter_secret['access-key'],
                          twitter_secret['access-secret'])
    api = tweepy.API(auth)

    cursor = tweepy.Cursor(api.search,
                           q=event['search'],
                           include_entities=True,
                           tweet_mode='extended',
                           lang='en')

    tweets = list(cursor.items())
    tweets = [process_tweet(t) for t in tweets if not t.retweeted]

    return json.dumps({"tweets": tweets})

From my desktop then, I have code which invokes the lambda function.

aws_lambda = boto3.client('lambda', region_name="us-east-1")
payload = {"search": "paint%20protection%20film filter:safe"}
lambda_response = aws_lambda.invoke(FunctionName="twitter-searcher",
                                    InvocationType="RequestResponse",
                                    Payload=json.dumps(payload))
results = lambda_response['Payload'].read()
tweets = results.decode('utf-8')

The problem is that somewhere between json.dumpsing the output in lambda and reading the payload in Python, the data has gotten screwy. For example, a line break which should be \n becomes \\\\n, all of the double quotes are stored as \\" and Unicode characters are all prefixed by \\. So, everything that was escaped, when it was received by Python on my desktop with the escaping character being escaped. Consider this element of the list that was returned (with manual formatting).

'{\\"userid\\": 190764134,
  \\"username\\": \\"CapitalGMC\\",
  \\"created\\": \\"2018-09-02 15:00:00\\",
  \\"tweetid\\": 1036267504673337344,
  \\"text\\": \\"Protect your vehicle\'s paint! Find out how on this week\'s blog.
              \\\\ud83d\\\\udc47\\\\n\\\\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW\\"}'

I can use regex to fix some problems (\\" and \\\\n) but the Unicode is tricky because even if I match it, how do I replace it with a properly escaped character? When I do this in R, using the aws.lambda package, everything is fine, no weird escaped escapes.

What am I doing wrong on my desktop with the response from AWS Lambda that's garbling the data?

Update

The process tweet function is below. It literally just pulls out the bits I care to keep, formats the datetime object to be a string and returns a dictionary.

def process_tweet(tweet):
    bundle = {
        "userid": tweet.user.id,
        "username": tweet.user.screen_name,
        "created": str(tweet.created_at),
        "tweetid": tweet.id,
        "text": tweet.full_text
    }
    return bundle

Just for reference, in R the code looks like this.

payload = list(search="paint%20protection%20film filter:safe")
results = aws.lambda::invoke_function("twitter-searcher"
                                      ,payload = jsonlite::toJSON(payload
                                                              ,auto_unbox=TRUE)
                                      ,type = "RequestResponse"
                                      ,key = creds$key
                                      ,secret = creds$secret
                                      ,session_token = creds$session_token
                                      ,region = creds$region)
tweets = jsonlite::fromJSON(results)
str(tweets)

#> 'data.frame':    133 obs. of  5 variables:
#>  $ userid  : num  2231994854 407106716 33553091 7778772 782310 ...
#>  $ username: chr  "adaniel_080213" "Prestige_AdamL" "exclusivedetail" "tedhu" ...
#>  $ created : chr  "2018-09-12 14:07:09" "2018-09-12 11:31:56" "2018-09-12 10:46:55" "2018-09-12 07:27:49" ...
#>  $ tweetid : num  1039878080968323072 1039839019989983232 1039827690151444480 1039777586975526912 1039699310382931968 ...
#>  $ text    : chr  "I liked a @YouTube video https://url/97sRShN4pM Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film" "Another #Corvette #ZO6 full body clearbra wrap completed using @xpeltech ultimate plus PPF ... Paint protection"| __truncated__ "We recently protected this Tesla Model 3 with Paint Protection Film and Ceramic Coating.#teslamodel3 #charlotte"| __truncated__ "Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film https://url/AD1cl5dNX3" ...

tweets[131,]
#>        userid   username             created             tweetid
#> 131 190764134 CapitalGMC 2018-09-02 15:00:00 1036267504673337344
#>          text
#> 131 Protect your vehicle's paint! Find out how on this week's blog.👇\n\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW
like image 941
Mark Avatar asked Sep 12 '18 14:09

Mark


People also ask

How do I fetch a JSON file in Python?

json.loads(): If you have a JSON string, you can parse it by using the json.loads() method.json.loads() does not take the file path, but the file contents as a string, using fileobject.read() with json.loads() we can return the content of the file.

How do I print JSON data in pretty format?

We can use the Python json module to pretty-print the JSON data. The json module is recommended to work with JSON files. We can use the dumps() method to get the pretty formatted JSON string.


1 Answers

Don't use json.dumps()

I had a similar issue, and when I just returned "body": content instead of "body": json.dumps(content) I could easily access and manipulate my data. Before that, I got that weird form that looks like JSON, but it's not.

like image 187
Psycho Buddha Avatar answered Oct 20 '22 16:10

Psycho Buddha