Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PandaSQL very slow

I'm currently switching from R to Python (anconda/Spyder Python 3) for data analysis purposes. In R I used to use a lot R sqldf. Since I'm good at sql queries, I didn't want to re-learn data.table syntax. Using R sqldf, I never had performance issues.

Now, in Python I tried using pandasql a simple df = "SELECT * From table LIMIT 1" will last forever on 193k rows, 19 columns.

I tried pysqldf but I get an error saying that the table doesn't exist, but it does exist.

# -*- coding: utf-8 -*-

import pandas as pd
import pandasql 
import pysqldf

#Data loading    
orders = pd.read_csv('data/orders.csv',sep = ';')

###### PANDASQL ######
test = pandasql.sqldf("SELECT  orders_id from orders LIMIT 1;",globals())
# Will last several minutes and use a lot of RAM

test = pandasql.sqldf("SELECT  orders_id from orders LIMIT 1;",locals())
# Will last several minutes and use a lot of RAM


###### PYSQLDF ######
sqldf = pysqldf.SQLDF(globals())
test = sqldf.execute("SELECT  * from orders LIMIT 1;")
#error
#Error for pysqldf

Traceback (most recent call last):

  File "<ipython-input-12-30b645117dc4>", line 1, in <module>
    test = sqldf.execute("SELECT  * from orders LIMIT 1;")

  File "C:\Users\p.stepniewski\AppData\Local\Continuum\anaconda3\lib\site-packages\pysqldf\sqldf.py", line 76, in execute
    self._del_table(tables)

  File "C:\Users\p.stepniewski\AppData\Local\Continuum\anaconda3\lib\site-packages\pysqldf\sqldf.py", line 117, in _del_table
    self.conn.execute("drop table " + tablename)

OperationalError: no such table: orders

Am I missing something? Would prefer a pandasql/pysqldf answer before a "learn panda querying syntax".

Sqldf in R worked on complex queries on tables up to 10millions rows, on an i7/12G ram laptop.

Thanks !

like image 360
Philippe Stepniewski Avatar asked Jul 30 '18 09:07

Philippe Stepniewski


1 Answers

Ok just found the solution.

  • Entirely dropped the Anaconda install.
  • Cleaned related folders.
  • Installed from scratch Python 3.6 with PIP.
  • Then pip installed pandas, pandasql.
  • Launched my script. Script executed in less than a second (pandasql)
like image 147
Philippe Stepniewski Avatar answered Oct 27 '22 13:10

Philippe Stepniewski