PySpark — UnicodeEncodeError: 'ascii' codec can't encode character

Tags:

Loading a dataframe with foreign characters (åäö) into Spark using spark.read.csv, with encoding='utf-8' and trying to do a simple show().

>>> df.show()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 287, in show
print(self._jdf.showString(n, truncate))
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 579: ordinal not in range(128)

I figure this is probably related to Python itself but I cannot understand how any of the tricks that are mentioned here for example can be applied in the context of PySpark and the show()-function.

831

asked Sep 23 '16 13:09

salient

2 Answers

https://issues.apache.org/jira/browse/SPARK-11772 talks about this issue and gives a solution that runs:

export PYTHONIOENCODING=utf8

before running pyspark. I wonder why above works, because sys.getdefaultencoding() returned utf-8 for me even without it.

How to set sys.stdout encoding in Python 3? also talks about this and gives the following solution for Python 3:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)

answered Sep 26 '22 17:09

Jussi Kujala

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

This works for me, I am setting the encoding upfront and it is valid throughout the script.

answered Sep 23 '22 17:09

swapnil shashank

Related questions
                            
                                Unable to load the repository(PyDev for eclipse)
                            
                                Python/Django: sending emails in the background
                            
                                How to create a programming language in Python [closed]
                            
                                TypeError: count() takes exactly one argument
                            
                                Python Subprocess Error in using "cp"
                            
                                How to cache requirements for a Django project on Travis-CI?
                            
                                Pyplot: show only first 3 lines in legend
                            
                                sqlite3.OperationalError: no such column:
                            
                                Sort dictionary alphabetically when the key is a string (name)
                            
                                How to print BASE_DIR from settings.py from django app in terminal?
                            
                                Django Rest Framework 3.1 breaks pagination.PaginationSerializer
                            
                                Return SQLAlchemy results as dicts instead of lists
                            
                                Using pandas.Dataframe.groupby without alphabetical ordering
                            
                                Elegant way to match a string to a random color matplotlib
                            
                                psycopg2 on elastic beanstalk - can't deploy app
                            
                                why is logged_out.html not overriding in django registration?
                            
                                Difference between encoding utf-8 and utf8 in Python 3.5
                            
                                Python's closure - local variable referenced before assignment
                            
                                Terminate a Python multiprocessing program once a one of its workers meets a certain condition
                            
                                Flask session variable not persisting between requests

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PySpark — UnicodeEncodeError: 'ascii' codec can't encode character

Tags:

python

python-2.7

apache-spark

pyspark

salient

People also ask

2 Answers

Jussi Kujala

swapnil shashank

Recent Activity

Donate For Us