Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

which one is effecient, join queries using sql, or merge queries using pandas?

I want to use data from multiple tables in a pandas dataframe. I have 2 idea for downloading data from the server, one way is to use SQL join and retrieve data and one way is to download dataframes separately and merge them using pandas.merge.

SQL Join

when I want to download data into pandas.

query='''SELECT table1.c1, table2.c2
    FROM table1
    INNER JOIN table2 ON table1.ID=table2.ID where condidtion;'''
df = pd.read_sql(query,engine)

Pandas Merge

df1 = pd.read_sql('select c1 from table1 where condition;',engine)
df2 = pd.read_sql('select c2 from table2 where condition;',engine)
df = pd.merge(df1,df2,on='ID', how='inner')

which one is faster? Assume that I want to do that for more than 2 tables and 2 columns. Is there any better idea? If it is necessary to know I use PostgreSQL.

like image 900
Mehdi Avatar asked Apr 25 '18 11:04

Mehdi


1 Answers

The former is faster than the latter. The former just do a single call to the database, and return the result already joined and filtered. However, the latter do two calls to the database, and then it merges the result sets in the application/program.

like image 108
alfonsohdez08 Avatar answered Nov 14 '22 23:11

alfonsohdez08