I am adding a single column to a Postgres table with 100+ columns via Django ( a new migration). How can I update a column in a PostgreSQL table with the data from a pandas
data_frame? The pseudo-code for Postgres SQL UPDATE would be:
UPDATE wide_table wt
SET wt.z = df.z
WHERE date = 'todays_date'
The reason for doing it this way is that I am computing a column in the data_frame
using a CSV that is in S3
(this is df.z
). The docs for Postgres update are straightforward to use, but I am unsure how to do this via Django, sqlalchemy, pyodbc, or the like.
I apologize if this is a bit convoluted. A small and incomplete example would be:
z
)identifier | x | y | z | date
foo | 2 | 1 | 0.0 | ...
bar | 2 | 8 | 0.0 | ...
baz | 3 | 7 | 0.0 | ...
foo | 2 | 8 | 0.0 | ...
foo | 1 | 5 | 0.0 | ...
baz | 2 | 8 | 0.0 | ...
bar | 9 | 3 | 0.0 | ...
baz | 2 | 3 | 0.0 | ...
def apply_function(identifier):
# Maps baz-> 15.0, bar-> 19.6, foo -> 10.0 for single date
df = pd.read_csv("s3_file_path/date_file_name.csv")
# Compute 'z' based on identifier and S3 csv
return z
postgres_query = "Select identifier from wide_table"
df = pd.read_sql(sql=postgres_query, con=engine)
df['z'] = df.identifier.apply(apply_function)
# Python / SQL Update Logic here to update Postgres Column
???
z
)identifier | x | y | z | date
foo | 2 | 1 | 10.0 | ...
bar | 2 | 8 | 19.6 | ...
baz | 3 | 7 | 15.0 | ...
foo | 2 | 8 | 10.0 | ...
foo | 1 | 5 | 10.0 | ...
baz | 2 | 8 | 15.0 | ...
bar | 9 | 3 | 19.6 | ...
baz | 2 | 3 | 15.0 | ...
NOTE: The values in z will change daily so simply creating another table to hold these z
values is not a great solution. Also, I'd really prefer to avoid deleting all of the data and adding it back.
Ran into a similar problem and the current accepted solution was too slow for me. My table had 500k+ rows and i needed to update 100k+ rows. After lengthy research and trial and error i arrived at an efficient and correct solution.
The idea is to use psycopg as your writer and to use a temp table. df
is your pandas dataframe that contains values you want to set.
import psycopg2
conn = psycopg2.connect("dbname='db' user='user' host='localhost' password='test'")
cur = conn.cursor()
rows = zip(df.id, df.z)
cur.execute("""CREATE TEMP TABLE codelist(id INTEGER, z INTEGER) ON COMMIT DROP""")
cur.executemany("""INSERT INTO codelist (id, z) VALUES(%s, %s)""", rows)
cur.execute("""
UPDATE table_name
SET z = codelist.z
FROM codelist
WHERE codelist.id = vehicle.id;
""")
cur.rowcount
conn.commit()
cur.close()
conn.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With