which one is effecient, join queries using sql, or merge queries using pandas?

Question

I want to use data from multiple tables in a pandas dataframe. I have 2 idea for downloading data from the server, one way is to use SQL join and retrieve data and one way is to download dataframes separately and merge them using pandas.merge.

SQL Join

when I want to download data into pandas.

query='''SELECT table1.c1, table2.c2
    FROM table1
    INNER JOIN table2 ON table1.ID=table2.ID where condidtion;'''
df = pd.read_sql(query,engine)

Pandas Merge

df1 = pd.read_sql('select c1 from table1 where condition;',engine)
df2 = pd.read_sql('select c2 from table2 where condition;',engine)
df = pd.merge(df1,df2,on='ID', how='inner')

which one is faster? Assume that I want to do that for more than 2 tables and 2 columns. Is there any better idea? If it is necessary to know I use PostgreSQL.

alfonsohdez08 · Accepted Answer

The former is faster than the latter. The former just do a single call to the database, and return the result already joined and filtered. However, the latter do two calls to the database, and then it merges the result sets in the application/program.

which one is effecient, join queries using sql, or merge queries using pandas?

Tags:

python

sql

pandas

postgresql

SQL Join

Pandas Merge

Mehdi

1 Answers

alfonsohdez08

Recent Activity

Donate For Us

which one is effecient, join queries using sql, or merge queries using pandas?

Tags:

python

sql

pandas

postgresql

SQL Join

Pandas Merge

Mehdi

1 Answers

alfonsohdez08

Related questions

Recent Activity

Donate For Us