I have a function in AWS Lambda that connects to the Twitter API and returns the tweets which match a specific search query I provided via the event. A simplified version of the function is below. There's a few helper functions I use like <code>get_secret</code> to manage API keys and <code>process_tweet</code> which limits what data gets sent back and does things like convert the created at date to a string. The net result is that I should get back a list of dictionaries. <pre class="prettyprint"><code>def lambda_handler(event, context): twitter_secret = get_secret("twitter") auth = tweepy.OAuthHandler(twitter_secret['api-key'], twitter_secret['api-secret']) auth.set_access_token(twitter_secret['access-key'], twitter_secret['access-secret']) api = tweepy.API(auth) cursor = tweepy.Cursor(api.search, q=event['search'], include_entities=True, tweet_mode='extended', lang='en') tweets = list(cursor.items()) tweets = [process_tweet(t) for t in tweets if not t.retweeted] return json.dumps({"tweets": tweets}) </code></pre> From my desktop then, I have code which invokes the lambda function. <pre class="prettyprint"><code>aws_lambda = boto3.client('lambda', region_name="us-east-1") payload = {"search": "paint%20protection%20film filter:safe"} lambda_response = aws_lambda.invoke(FunctionName="twitter-searcher", InvocationType="RequestResponse", Payload=json.dumps(payload)) results = lambda_response['Payload'].read() tweets = results.decode('utf-8') </code></pre> The problem is that somewhere between <code>json.dumps</code>ing the output in lambda and reading the payload in Python, the data has gotten screwy. For example, a line break which should be <code>\n</code> becomes <code>\\\\n</code>, all of the double quotes are stored as <code>\\"</code> and Unicode characters are all prefixed by <code>\\</code>. So, everything that was escaped, when it was received by Python on my desktop with the escaping character being escaped. Consider this element of the list that was returned (with manual formatting). <pre class="prettyprint"><code>'{\\"userid\\": 190764134, \\"username\\": \\"CapitalGMC\\", \\"created\\": \\"2018-09-02 15:00:00\\", \\"tweetid\\": 1036267504673337344, \\"text\\": \\"Protect your vehicle\'s paint! Find out how on this week\'s blog. \\\\ud83d\\\\udc47\\\\n\\\\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW\\"}' </code></pre> I can use regex to fix some problems (<code>\\"</code> and <code>\\\\n</code>) but the Unicode is tricky because even if I match it, how do I replace it with a properly escaped character? When I do this in R, using the <code>aws.lambda</code> package, everything is fine, no weird escaped escapes. What am I doing wrong on my desktop with the response from AWS Lambda that's garbling the data? <h3>Update</h3> The process tweet function is below. It literally just pulls out the bits I care to keep, formats the datetime object to be a string and returns a dictionary. <pre class="prettyprint"><code>def process_tweet(tweet): bundle = { "userid": tweet.user.id, "username": tweet.user.screen_name, "created": str(tweet.created_at), "tweetid": tweet.id, "text": tweet.full_text } return bundle </code></pre> Just for reference, in R the code looks like this. <pre class="prettyprint"><code>payload = list(search="paint%20protection%20film filter:safe") results = aws.lambda::invoke_function("twitter-searcher" ,payload = jsonlite::toJSON(payload ,auto_unbox=TRUE) ,type = "RequestResponse" ,key = creds$key ,secret = creds$secret ,session_token = creds$session_token ,region = creds$region) tweets = jsonlite::fromJSON(results) str(tweets) #> 'data.frame': 133 obs. of 5 variables: #> $ userid : num 2231994854 407106716 33553091 7778772 782310 ... #> $ username: chr "adaniel_080213" "Prestige_AdamL" "exclusivedetail" "tedhu" ... #> $ created : chr "2018-09-12 14:07:09" "2018-09-12 11:31:56" "2018-09-12 10:46:55" "2018-09-12 07:27:49" ... #> $ tweetid : num 1039878080968323072 1039839019989983232 1039827690151444480 1039777586975526912 1039699310382931968 ... #> $ text : chr "I liked a @YouTube video https://url/97sRShN4pM Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film" "Another #Corvette #ZO6 full body clearbra wrap completed using @xpeltech ultimate plus PPF ... Paint protection"| __truncated__ "We recently protected this Tesla Model 3 with Paint Protection Film and Ceramic Coating.#teslamodel3 #charlotte"| __truncated__ "Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film https://url/AD1cl5dNX3" ... tweets[131,] #> userid username created tweetid #> 131 190764134 CapitalGMC 2018-09-02 15:00:00 1036267504673337344 #> text #> 131 Protect your vehicle's paint! Find out how on this week's blog.👇\n\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW </code></pre>

Don't use <code>json.dumps()</code> I had a similar issue, and when I just returned <code>"body": content</code> instead of <code>"body": json.dumps(content)</code> I could easily access and manipulate my data. Before that, I got that weird form that looks like JSON, but it's not.

How to retrieve well formatted JSON from AWS Lambda using Python

Tags:

python

json

aws-lambda

I have a function in AWS Lambda that connects to the Twitter API and returns the tweets which match a specific search query I provided via the event. A simplified version of the function is below. There's a few helper functions I use like get_secret to manage API keys and process_tweet which limits what data gets sent back and does things like convert the created at date to a string. The net result is that I should get back a list of dictionaries.

def lambda_handler(event, context):
    twitter_secret = get_secret("twitter")

    auth = tweepy.OAuthHandler(twitter_secret['api-key'],
                               twitter_secret['api-secret'])
    auth.set_access_token(twitter_secret['access-key'],
                          twitter_secret['access-secret'])
    api = tweepy.API(auth)

    cursor = tweepy.Cursor(api.search,
                           q=event['search'],
                           include_entities=True,
                           tweet_mode='extended',
                           lang='en')

    tweets = list(cursor.items())
    tweets = [process_tweet(t) for t in tweets if not t.retweeted]

    return json.dumps({"tweets": tweets})

From my desktop then, I have code which invokes the lambda function.

aws_lambda = boto3.client('lambda', region_name="us-east-1")
payload = {"search": "paint%20protection%20film filter:safe"}
lambda_response = aws_lambda.invoke(FunctionName="twitter-searcher",
                                    InvocationType="RequestResponse",
                                    Payload=json.dumps(payload))
results = lambda_response['Payload'].read()
tweets = results.decode('utf-8')

The problem is that somewhere between json.dumpsing the output in lambda and reading the payload in Python, the data has gotten screwy. For example, a line break which should be \n becomes \\\\n, all of the double quotes are stored as \\" and Unicode characters are all prefixed by \\. So, everything that was escaped, when it was received by Python on my desktop with the escaping character being escaped. Consider this element of the list that was returned (with manual formatting).

'{\\"userid\\": 190764134,
  \\"username\\": \\"CapitalGMC\\",
  \\"created\\": \\"2018-09-02 15:00:00\\",
  \\"tweetid\\": 1036267504673337344,
  \\"text\\": \\"Protect your vehicle\'s paint! Find out how on this week\'s blog.
              \\\\ud83d\\\\udc47\\\\n\\\\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW\\"}'

I can use regex to fix some problems (\\" and \\\\n) but the Unicode is tricky because even if I match it, how do I replace it with a properly escaped character? When I do this in R, using the aws.lambda package, everything is fine, no weird escaped escapes.

What am I doing wrong on my desktop with the response from AWS Lambda that's garbling the data?

Update

The process tweet function is below. It literally just pulls out the bits I care to keep, formats the datetime object to be a string and returns a dictionary.

def process_tweet(tweet):
    bundle = {
        "userid": tweet.user.id,
        "username": tweet.user.screen_name,
        "created": str(tweet.created_at),
        "tweetid": tweet.id,
        "text": tweet.full_text
    }
    return bundle

Just for reference, in R the code looks like this.

payload = list(search="paint%20protection%20film filter:safe")
results = aws.lambda::invoke_function("twitter-searcher"
                                      ,payload = jsonlite::toJSON(payload
                                                              ,auto_unbox=TRUE)
                                      ,type = "RequestResponse"
                                      ,key = creds$key
                                      ,secret = creds$secret
                                      ,session_token = creds$session_token
                                      ,region = creds$region)
tweets = jsonlite::fromJSON(results)
str(tweets)

#> 'data.frame':    133 obs. of  5 variables:
#>  $ userid  : num  2231994854 407106716 33553091 7778772 782310 ...
#>  $ username: chr  "adaniel_080213" "Prestige_AdamL" "exclusivedetail" "tedhu" ...
#>  $ created : chr  "2018-09-12 14:07:09" "2018-09-12 11:31:56" "2018-09-12 10:46:55" "2018-09-12 07:27:49" ...
#>  $ tweetid : num  1039878080968323072 1039839019989983232 1039827690151444480 1039777586975526912 1039699310382931968 ...
#>  $ text    : chr  "I liked a @YouTube video https://url/97sRShN4pM Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film" "Another #Corvette #ZO6 full body clearbra wrap completed using @xpeltech ultimate plus PPF ... Paint protection"| __truncated__ "We recently protected this Tesla Model 3 with Paint Protection Film and Ceramic Coating.#teslamodel3 #charlotte"| __truncated__ "Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film https://url/AD1cl5dNX3" ...

tweets[131,]
#>        userid   username             created             tweetid
#> 131 190764134 CapitalGMC 2018-09-02 15:00:00 1036267504673337344
#>          text
#> 131 Protect your vehicle's paint! Find out how on this week's blog.👇\n\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW

941

asked Sep 12 '18 14:09

Mark

1 Answers

Don't use json.dumps()

I had a similar issue, and when I just returned "body": content instead of "body": json.dumps(content) I could easily access and manipulate my data. Before that, I got that weird form that looks like JSON, but it's not.

187

answered Oct 20 '22 16:10

Psycho Buddha

Related questions
                            
                                PyTorch: Extract learned weights correctly
                            
                                find_element_by_class_name for multiple classes [duplicate]
                            
                                Python Split String On Newline And Keep Newline
                            
                                Python unittest import problems
                            
                                Stop Jupyter notebook from generating new blank cells after every alt-enter (run)
                            
                                Converting Python 3 String of Bytes of Unicode - `str(utf8_encoded_str)` back to unicode
                            
                                multi line string formatting in python
                            
                                "Apps aren't loaded yet" when trying to run pytest-django
                            
                                Correct setup of django redis celery and celery beats
                            
                                Pandas, read CSV ignoring extra commas
                            
                                joblib.load __main__ AttributeError
                            
                                Pandas reverse of diff()
                            
                                Zero Padding a 3d Numpy array
                            
                                Plotting seaborn heatmap on top of a background picture
                            
                                User input in dialog box
                            
                                How to get the user's name in Telegram Bot?
                            
                                How to use spaCy to create a new entity and learn only from keyword list
                            
                                Python 3.6.x PyInstaller gives error "No module named 'PyQt5.sip'"
                            
                                AttributeError: module 'tensorflow' has no attribute 'name_scope' with Keras
                            
                                Django 2.0 url parameters in get_queryset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With