I am trying to insert data frame of size (8760, 1574)
into a table in SQLite
. My code is below:
class DatabaseWorker(object):
def __init__(self, db_name):
self.db = db_name
def create_table(self, table_name, column_names):
conn = sqlite3.connect(self.db)
cur = conn.cursor()
q1 = 'DROP TABLE IF EXISTS %s' %(table_name)
q2 = 'CREATE TABLE ' + table_name + ' ' + '(' + ', '.join(str(x) for x in column_names) + ')'
cur.execute(q1)
cur.execute(q2)
conn.commit()
conn.close()
def insert_table(self, table_name, data):
conn = sqlite3.connect(self.db)
data.to_sql(table_name, conn, if_exists='append', index=False)
conn.commit()
conn.close()
cnx = DatabaseWorker("users")
cnx.create_table("user_activity", df_final.columns.values.tolist())
cnx.create_table("user_similarity_matrix", df_transformed.columns.values.tolist())
cnx.insert_table("user_activity", df_final)
cnx.insert_table("user_similarity_matrix", df_transformed)
Size of df_final
is (249238, 7)
and size of df_transformed
is (8760, 1574)
. There no error in inserting df_final
but I am getting error while inserting df_transformed
. The error is as below:
----> 5 cnx.insert_table("user_similarity_matrix", df_transformed)
---> 30 data.to_sql(table_name, conn, if_exists='append', index=False)
pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index, index_label=index_label, schema=schema, chunksize=chunksize, dtype=dtype)
OperationalError: too many SQL variables
One of the answers on "Too many SQL variables" error in django witih sqlite3 there is a limit of 999 variables? Is there any way I can sidestep that. I will appreciate your suggestions. Thanks a lot in advance.
The SQLITE_MAX_VARIABLE_NUMBER
was increased in SQLite > 3.32.0 from 999 to 32766.
Maximum Number Of Host Parameters In A Single SQL Statement
[...]
SQLite allocates space to hold all host parameters between 1 and the largest host parameter number used. Hence, an SQL statement that contains a host parameter like ?1000000000 would require gigabytes of storage. This could easily overwhelm the resources of the host machine. To prevent excessive memory allocations, the maximum value of a host parameter number is SQLITE_MAX_VARIABLE_NUMBER, which defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.
This means that your df_transformed
having 1574 columns should work as well in recent SQLite builds. However note that SQLITE_MAX_COLUMN=2000
and you don't have much room for growing the number of columns in your data frame.
The default setting for SQLITE_MAX_COLUMN is 2000. You can change it at compile time to values as large as 32767. On the other hand, many experienced database designers will argue that a well-normalized database will never need more than 100 columns in a table.
In most applications, the number of columns is small - a few dozen. There are places in the SQLite code generator that use algorithms that are O(N²) where N is the number of columns.
Alternatives could be a custom document serialisation using pandas.DataFrame.to_json
/ pandas.read_json
in a SQLite TEXT
column, pandas.DataFrame
sqlite3
adapter and so on. Because SQLITE_MAX_LENGTH
is 1GB per row, serialisation affords much wider data frames.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With