I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:
I have a data frame of 8 columns. Here is the column list/index:
metricDF.columns
Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)
I then use to_sql
to append the data up to mySQL
metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')
I get a strange error about a column being "nan":
OperationalError: (1054, "Unknown column 'nan' in 'field list'")
As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.
This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.
There is no way to represent NaN , PositiveInfinity , or NegativeInfinity in Transact-SQL.
Update: starting with pandas 0.15, to_sql
supports writing NaN
values (they will be written as NULL
in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.
This is probably due to NaN
values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)
As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan
values to None with:
df2 = df.astype(object).where(pd.notnull(df), None)
and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaN
s:
df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')
using the previous solution will change column dtype from float64 to object_.
I have found a better solution, just add the following _write_mysql function:
from pandas.io import sql
def _write_mysql(frame, table, names, cur):
bracketed_names = ['`' + column + '`' for column in names]
col_names = ','.join(bracketed_names)
wildcards = ','.join([r'%s'] * len(names))
insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
table, col_names, wildcards)
data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]
cur.executemany(insert_query, data)
And then override its implementation in pandas as below:
sql._write_mysql = _write_mysql
With this code, nan values will be saved correctly in the database without altering the column type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With