Join all PostgreSQL tables and make a Python dictionary

Question

I need to join all PostgreSQL tables and convert them in a Python dictionary. There are 72 tables in the database. The total number of columns is greater than 1600.

I wrote a simple Python script that joins several tables but fails to join all of them due to the memory error. All memory is occupied during the script execution. And I run the script on a new virtual server with 128GB RAM and 8 CPU. It fails during the lambda function execution.

How could the following code be improved to execute all tables join?

from sqlalchemy import create_engine
import pandas as pd

auth = 'user:pass'
engine = create_engine('postgresql://' + auth + '@host.com:5432/db')

sql_tables = ['table0', 'table1', 'table3', ..., 'table72']        
df_arr = []
[df_arr.append(pd.read_sql_query('select * from "' + table + '"', con=engine)) for table in sql_tables]

df_join = reduce(lambda left, right: pd.merge(left, right, how='outer', on=['USER_ID']), df_arr)
raw_dict = pd.DataFrame.to_dict(df_join.where((pd.notnull(df_join)), 'no_data'))

print(df_join)
print(raw_dict)
print(len(df_arr))

Is it ok to use Pandas for my purpose? Are there better solutions?

The ultimate goal is to denormalize DB data to be able to index it into Elasticsearch as documents, one document per user.

Luis Teijon · Accepted Answer

Why don't you create a postgres function instead of script?

Here are some advises that could help you to avoid the memory error:

You can use WITH clause which makes better use of your memory.
You can create some physical tables for storing the information of different groups of tables of your database. These physical tables will avoid to use a great amount of memory. After that, all you have to do is joining only those physical tables. You can create a function for it.
You can create a Data Warehouse by denormalizing the tables you need.
Last but not least: Make sure you are using Indexes appropriately.

Join all PostgreSQL tables and make a Python dictionary

Tags:

python

database

pandas

postgresql

elasticsearch

srgbnd

1 Answers

Luis Teijon

Recent Activity

Donate For Us

Join all PostgreSQL tables and make a Python dictionary

Tags:

python

database

pandas

postgresql

elasticsearch

srgbnd

1 Answers

Luis Teijon

Related questions

Recent Activity

Donate For Us