I am trying to truncate a postgres destination table prior to insert, and in general, trying to fire external functions utilizing the connections already created in GLUE. Has anyone been able to do so?

I've tried the <code>DROP/ TRUNCATE</code> scenario, but have not been able to do it with connections already created in Glue, but with a pure Python PostgreSQL driver, pg8000. <ol> <li>Download the tar of pg8000 from pypi </li> <li>Create an empty <code>__init__.py</code> in the root folder</li> <li>Zip up the contents & upload to S3</li> <li>Reference the zip file in the <code>Python lib path</code> of the job </li> <li>Set the DB connection details as job params (make sure to prepend all key names with <code>--</code>). Tick the "Server-side encryption" box.</li> </ol> Then you can simply create a connection and execute SQL. <pre class="prettyprint"><code>import sys from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.dynamicframe import DynamicFrame from awsglue.job import Job import pg8000 args = getResolvedOptions(sys.argv, [ 'JOB_NAME', 'PW', 'HOST', 'USER', 'DB' ]) # ... # Create Spark & Glue context job = Job(glueContext) job.init(args['JOB_NAME'], args) # ... config_port = 5432 conn = pg8000.connect( database=args['DB'], user=args['USER'], password=args['PW'], host=args['HOST'], port=config_port ) query = "TRUNCATE TABLE {0};".format(".".join([schema, table])) cur = conn.cursor() cur.execute(query) conn.commit() cur.close() conn.close() </code></pre>

AWS Glue - Truncate destination postgres table prior to insert

1 Answers

I've tried the DROP/ TRUNCATE scenario, but have not been able to do it with connections already created in Glue, but with a pure Python PostgreSQL driver, pg8000.

Download the tar of pg8000 from pypi
Create an empty __init__.py in the root folder
Zip up the contents & upload to S3
Reference the zip file in the Python lib path of the job
Set the DB connection details as job params (make sure to prepend all key names with --). Tick the "Server-side encryption" box.

Then you can simply create a connection and execute SQL.

import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job

import pg8000

args = getResolvedOptions(sys.argv, [
    'JOB_NAME',
    'PW',
    'HOST',
    'USER',
    'DB'
])
# ...
# Create Spark & Glue context

job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# ...
config_port = 5432
conn = pg8000.connect(
    database=args['DB'], 
    user=args['USER'], 
    password=args['PW'],
    host=args['HOST'],
    port=config_port
)
query = "TRUNCATE TABLE {0};".format(".".join([schema, table]))
cur = conn.cursor()
cur.execute(query)
conn.commit()
cur.close()
conn.close()

answered Sep 19 '22 15:09

thenaturalist

Related questions
                            
                                Implementing 3D vectors in Python: numpy vs x,y,z fields
                            
                                Flatten hierarchically indexed pandas.DataFrame from groupby and multiple aggregation
                            
                                python inserts pictures to powerpoint, how to set the width and height of the picture?
                            
                                Plotting heatmap for 3 columns in python with seaborn
                            
                                How to locate and read Data Matrix code with python
                            
                                python astype(str) gives SettingWithCopyWarning and requests I use loc
                            
                                sqlalchemy.exc.UnboundExecutionError: Table object 'responsibles' is not bound to an Engine or Connection
                            
                                Dynamically generating elements of list within list
                            
                                Python Pandas Match Vlookup columns based on header values
                            
                                NumPy sum one array based on values in another array for each matching element in 3rd array
                            
                                Get *all* current jobs from python-rq
                            
                                Pythonic way to use the second condition in list comprehensions
                            
                                PyCharm: Unresolved reference with Scapy
                            
                                'DataFrame' object has no attribute 'melt'
                            
                                OpenCV image subtraction vs Numpy subtraction
                            
                                Append data to HDF5 file with Pandas, Python
                            
                                No module named 'Queue'
                            
                                How to remove every word with non alphabetic characters
                            
                                Django Migrations: Same migrations being created with makemigrations
                            
                                Difference between foo.bar() and bar(foo)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AWS Glue - Truncate destination postgres table prior to insert

Tags:

python

postgresql

pyspark

aws-glue

Josh Hamann

People also ask

1 Answers

thenaturalist

Recent Activity

Donate For Us