Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to overcome "OperationalError: too many SQL variables"

I am trying to insert data frame of size (8760, 1574) into a table in SQLite. My code is below:

class DatabaseWorker(object):
    def __init__(self, db_name):
        self.db = db_name

    def create_table(self, table_name, column_names):
        conn = sqlite3.connect(self.db)
        cur = conn.cursor()
        q1 = 'DROP TABLE IF EXISTS %s' %(table_name)
        q2 = 'CREATE TABLE ' + table_name + ' ' + '(' + ', '.join(str(x) for x in column_names) + ')'
        cur.execute(q1)
        cur.execute(q2)
        conn.commit()
        conn.close()

    def insert_table(self, table_name, data):
        conn = sqlite3.connect(self.db)
        data.to_sql(table_name, conn, if_exists='append', index=False)
        conn.commit()
        conn.close()

cnx = DatabaseWorker("users")
cnx.create_table("user_activity", df_final.columns.values.tolist())
cnx.create_table("user_similarity_matrix", df_transformed.columns.values.tolist())
cnx.insert_table("user_activity", df_final)
cnx.insert_table("user_similarity_matrix", df_transformed)

Size of df_final is (249238, 7) and size of df_transformed is (8760, 1574). There no error in inserting df_final but I am getting error while inserting df_transformed. The error is as below:

----> 5 cnx.insert_table("user_similarity_matrix", df_transformed)

---> 30 data.to_sql(table_name, conn, if_exists='append', index=False)

pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index, index_label=index_label, schema=schema, chunksize=chunksize, dtype=dtype)

OperationalError: too many SQL variables

One of the answers on "Too many SQL variables" error in django witih sqlite3 there is a limit of 999 variables? Is there any way I can sidestep that. I will appreciate your suggestions. Thanks a lot in advance.

like image 475
Krishnang K Dalal Avatar asked Mar 09 '18 04:03

Krishnang K Dalal


1 Answers

The SQLITE_MAX_VARIABLE_NUMBER was increased in SQLite > 3.32.0 from 999 to 32766.

  1. Maximum Number Of Host Parameters In A Single SQL Statement

    [...]

    SQLite allocates space to hold all host parameters between 1 and the largest host parameter number used. Hence, an SQL statement that contains a host parameter like ?1000000000 would require gigabytes of storage. This could easily overwhelm the resources of the host machine. To prevent excessive memory allocations, the maximum value of a host parameter number is SQLITE_MAX_VARIABLE_NUMBER, which defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.

This means that your df_transformed having 1574 columns should work as well in recent SQLite builds. However note that SQLITE_MAX_COLUMN=2000 and you don't have much room for growing the number of columns in your data frame.

The default setting for SQLITE_MAX_COLUMN is 2000. You can change it at compile time to values as large as 32767. On the other hand, many experienced database designers will argue that a well-normalized database will never need more than 100 columns in a table.

In most applications, the number of columns is small - a few dozen. There are places in the SQLite code generator that use algorithms that are O(N²) where N is the number of columns.

Alternatives could be a custom document serialisation using pandas.DataFrame.to_json / pandas.read_json in a SQLite TEXT column, pandas.DataFrame sqlite3 adapter and so on. Because SQLITE_MAX_LENGTH is 1GB per row, serialisation affords much wider data frames.

like image 198
saaj Avatar answered Nov 08 '22 00:11

saaj