I want to use data from multiple tables in a pandas dataframe
. I have 2 idea for downloading data from the server, one way is to use SQL
join and retrieve data and one way is to download dataframes separately and merge them using pandas.merge.
when I want to download data into pandas
.
query='''SELECT table1.c1, table2.c2
FROM table1
INNER JOIN table2 ON table1.ID=table2.ID where condidtion;'''
df = pd.read_sql(query,engine)
df1 = pd.read_sql('select c1 from table1 where condition;',engine)
df2 = pd.read_sql('select c2 from table2 where condition;',engine)
df = pd.merge(df1,df2,on='ID', how='inner')
which one is faster? Assume that I want to do that for more than 2 tables and 2 columns.
Is there any better idea?
If it is necessary to know I use PostgreSQL
.
The former is faster than the latter. The former just do a single call to the database, and return the result already joined and filtered. However, the latter do two calls to the database, and then it merges the result sets in the application/program.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With