Python Pandas write to sql with NaN values

Tags:

I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:

I have a data frame of 8 columns. Here is the column list/index:

metricDF.columns

Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)

I then use to_sql to append the data up to mySQL

metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')

I get a strange error about a column being "nan":

OperationalError: (1054, "Unknown column 'nan' in 'field list'")

As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.

442

asked Apr 29 '14 00:04

user3221876

2 Answers

Update: starting with pandas 0.15, to_sql supports writing NaN values (they will be written as NULL in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.

This is probably due to NaN values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan values to None with:

df2 = df.astype(object).where(pd.notnull(df), None)

and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaNs:

df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')

133

answered Oct 17 '22 13:10

joris

using the previous solution will change column dtype from float64 to object_.

I have found a better solution, just add the following _write_mysql function:

from pandas.io import sql

def _write_mysql(frame, table, names, cur):
    bracketed_names = ['`' + column + '`' for column in names]
    col_names = ','.join(bracketed_names)
    wildcards = ','.join([r'%s'] * len(names))
    insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
        table, col_names, wildcards)

    data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]

    cur.executemany(insert_query, data)

And then override its implementation in pandas as below:

sql._write_mysql = _write_mysql

With this code, nan values will be saved correctly in the database without altering the column type.

answered Oct 17 '22 14:10

Amine Kerkeni

Related questions
                            
                                Dynamic Time Warping in Python [closed]
                            
                                Matplotlib errors result in a memory leak. How can I free up that memory?
                            
                                Why can I access an object during it's post_save Signal, but not when I trigger code within that signal that calls it on another process
                            
                                How can I make the xtick labels of a plot be simple drawings using matplotlib?
                            
                                Getting an element from tuple of tuples in python [duplicate]
                            
                                How can I make a sprite move when key is held down
                            
                                how to make qmenu item checkable pyqt4 python
                            
                                Python, fastest way to iterate over regular expressions but stop on first match
                            
                                How to do a git reset --hard using gitPython?
                            
                                Call functions from re.sub
                            
                                gi.repository Windows
                            
                                Understand python threading bug
                            
                                Pep8 E501: line too long error
                            
                                Cutting string after x chars at whitespace in python
                            
                                Real time matplotlib plot is not working while still in a loop [duplicate]
                            
                                UnicodeDecodeError while using json.dumps() [duplicate]
                            
                                Regex django url
                            
                                Unpack 1 variable, rest to a list
                            
                                Generate three different random numbers [duplicate]
                            
                                How to input 2 integers in one line in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas write to sql with NaN values

Tags:

python

sql

pandas

mysql

user3221876

People also ask

2 Answers

joris

Amine Kerkeni

Recent Activity

Donate For Us