This sequence:
from airflow.hooks.mysql_hook import MySqlHook
conn = MySqlHook(mysql_conn_id='conn_id')
engine = conn.get_sqlalchemy_engine()
df.to_sql('test_table', engine, if_exists='append', index=False)
produces the following:
UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 57: ordinal not in range(256)
This sequence works great:
from sqlalchemy import create_engine
engine = create_engine("mysql://{0}:{1}@{2}/capone?charset=utf8".format(user, pwd, host))
df.to_sql('test_table', engine, if_exists='append', index=False)
The key is in explicitly declaring the charset
. I have attempted to do this in airflow as follows with {"charset": "utf8"}
:
But this has not fixed the error. I've restarted my dev environment since making the changes and the admin panel lets me know that the edit was successful. How can I work with Airflow connections to my charsets as utf8?
Here are the steps to set default charset to UTF-8 encoding in Apache using htaccess. 1. Open htaccess file Before proceeding, please enable mod_rewrite (.htaccess) in your Apache web server. Open .htaccess file, typically located at /var/www/html/.htaccess
The most commonly used encodings are UTF-8 and UTF-16: A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII.
Open .htaccess file, typically located at /var/www/html/.htaccess 2. Set Default Encoding to UTF-8 Add the following to .htaccess file. This will be applicable for all files served by Apache web server. If you want to apply UTF-8 encoding only for specific file types, use the filesMatch directive.
UTF-8 is the preferred encoding for e-mail and web pages 16-bit Unicode Transformation Format is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. UTF-16 is used in major operating systems and environments, like Microsoft Windows, Java and .NET.
I realised that this is a bug in Airflow and I have reported it here: https://issues.apache.org/jira/browse/AIRFLOW-4824
For now I have a workaround with the following code:
def get_uri(hook):
conn = hook.get_connection(getattr(hook, hook.conn_name_attr))
login = ''
if conn.login:
login = '{conn.login}:{conn.password}@'.format(conn=conn)
host = conn.host
if conn.port is not None:
host += ':{port}'.format(port=conn.port)
charset = ''
if conn.extra_dejson.get('charset', False):
chrs = conn.extra_dejson["charset"]
if chrs.lower() == 'utf8' or chrs.lower() == 'utf-8':
charset = '?charset=utf8'
return '{conn.conn_type}://{login}{host}/{conn.schema}{charset}'.format(
conn=conn, login=login, host=host, charset=charset)
And then use it as follows:
url = get_uri(sql_hook)
from sqlalchemy import create_engine
engine = create_engine(url)
The real solution will be to send a pull request to the project overriding get_uri in mysql_hook.py.
from sqlalchemy import create_engine
from airflow.hooks.mysql_hook import MySqlHook
conn = MySqlHook(mysql_conn_id='conn_id')
uri = conn.get_uri()
engine = create_engine(uri+'?charset=utf8')
df.to_sql('test_table', engine, if_exists='append', index=False)
I fixed the problem by the code above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With