I have a function in AWS Lambda that connects to the Twitter API and returns the tweets which match a specific search query I provided via the event. A simplified version of the function is below. There's a few helper functions I use like get_secret
to manage API keys and process_tweet
which limits what data gets sent back and does things like convert the created at date to a string. The net result is that I should get back a list of dictionaries.
def lambda_handler(event, context):
twitter_secret = get_secret("twitter")
auth = tweepy.OAuthHandler(twitter_secret['api-key'],
twitter_secret['api-secret'])
auth.set_access_token(twitter_secret['access-key'],
twitter_secret['access-secret'])
api = tweepy.API(auth)
cursor = tweepy.Cursor(api.search,
q=event['search'],
include_entities=True,
tweet_mode='extended',
lang='en')
tweets = list(cursor.items())
tweets = [process_tweet(t) for t in tweets if not t.retweeted]
return json.dumps({"tweets": tweets})
From my desktop then, I have code which invokes the lambda function.
aws_lambda = boto3.client('lambda', region_name="us-east-1")
payload = {"search": "paint%20protection%20film filter:safe"}
lambda_response = aws_lambda.invoke(FunctionName="twitter-searcher",
InvocationType="RequestResponse",
Payload=json.dumps(payload))
results = lambda_response['Payload'].read()
tweets = results.decode('utf-8')
The problem is that somewhere between json.dumps
ing the output in lambda and reading the payload in Python, the data has gotten screwy. For example, a line break which should be \n
becomes \\\\n
, all of the double quotes are stored as \\"
and Unicode characters are all prefixed by \\
. So, everything that was escaped, when it was received by Python on my desktop with the escaping character being escaped. Consider this element of the list that was returned (with manual formatting).
'{\\"userid\\": 190764134,
\\"username\\": \\"CapitalGMC\\",
\\"created\\": \\"2018-09-02 15:00:00\\",
\\"tweetid\\": 1036267504673337344,
\\"text\\": \\"Protect your vehicle\'s paint! Find out how on this week\'s blog.
\\\\ud83d\\\\udc47\\\\n\\\\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW\\"}'
I can use regex to fix some problems (\\"
and \\\\n
) but the Unicode is tricky because even if I match it, how do I replace it with a properly escaped character? When I do this in R, using the aws.lambda
package, everything is fine, no weird escaped escapes.
What am I doing wrong on my desktop with the response from AWS Lambda that's garbling the data?
The process tweet function is below. It literally just pulls out the bits I care to keep, formats the datetime object to be a string and returns a dictionary.
def process_tweet(tweet):
bundle = {
"userid": tweet.user.id,
"username": tweet.user.screen_name,
"created": str(tweet.created_at),
"tweetid": tweet.id,
"text": tweet.full_text
}
return bundle
Just for reference, in R the code looks like this.
payload = list(search="paint%20protection%20film filter:safe")
results = aws.lambda::invoke_function("twitter-searcher"
,payload = jsonlite::toJSON(payload
,auto_unbox=TRUE)
,type = "RequestResponse"
,key = creds$key
,secret = creds$secret
,session_token = creds$session_token
,region = creds$region)
tweets = jsonlite::fromJSON(results)
str(tweets)
#> 'data.frame': 133 obs. of 5 variables:
#> $ userid : num 2231994854 407106716 33553091 7778772 782310 ...
#> $ username: chr "adaniel_080213" "Prestige_AdamL" "exclusivedetail" "tedhu" ...
#> $ created : chr "2018-09-12 14:07:09" "2018-09-12 11:31:56" "2018-09-12 10:46:55" "2018-09-12 07:27:49" ...
#> $ tweetid : num 1039878080968323072 1039839019989983232 1039827690151444480 1039777586975526912 1039699310382931968 ...
#> $ text : chr "I liked a @YouTube video https://url/97sRShN4pM Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film" "Another #Corvette #ZO6 full body clearbra wrap completed using @xpeltech ultimate plus PPF ... Paint protection"| __truncated__ "We recently protected this Tesla Model 3 with Paint Protection Film and Ceramic Coating.#teslamodel3 #charlotte"| __truncated__ "Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film https://url/AD1cl5dNX3" ...
tweets[131,]
#> userid username created tweetid
#> 131 190764134 CapitalGMC 2018-09-02 15:00:00 1036267504673337344
#> text
#> 131 Protect your vehicle's paint! Find out how on this week's blog.👇\n\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW
json.loads(): If you have a JSON string, you can parse it by using the json.loads() method.json.loads() does not take the file path, but the file contents as a string, using fileobject.read() with json.loads() we can return the content of the file.
We can use the Python json module to pretty-print the JSON data. The json module is recommended to work with JSON files. We can use the dumps() method to get the pretty formatted JSON string.
Don't use json.dumps()
I had a similar issue, and when I just returned "body": content
instead of "body": json.dumps(content)
I could easily access and manipulate my data. Before that, I got that weird form that looks like JSON, but it's not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With