Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark — UnicodeEncodeError: 'ascii' codec can't encode character

Loading a dataframe with foreign characters (åäö) into Spark using spark.read.csv, with encoding='utf-8' and trying to do a simple show().

>>> df.show()

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 287, in show
print(self._jdf.showString(n, truncate))
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 579: ordinal not in range(128)

I figure this is probably related to Python itself but I cannot understand how any of the tricks that are mentioned here for example can be applied in the context of PySpark and the show()-function.

like image 831
salient Avatar asked Sep 23 '16 13:09

salient


People also ask

How do I fix Unicodeencodeerror in Python?

Encoding Strings In order to get rid of the error, you should explicitly specify the desired encoding. This can be achieved with the use of encode() method, as demonstrated below. In most of the cases, utf-8 encoding will do the trick.

How does spark find missing values?

In PySpark DataFrame you can calculate the count of Null, None, NaN or Empty/Blank values in a column by using isNull() of Column class & SQL functions isnan() count() and when().


2 Answers

https://issues.apache.org/jira/browse/SPARK-11772 talks about this issue and gives a solution that runs:

export PYTHONIOENCODING=utf8

before running pyspark. I wonder why above works, because sys.getdefaultencoding() returned utf-8 for me even without it.

How to set sys.stdout encoding in Python 3? also talks about this and gives the following solution for Python 3:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
like image 86
Jussi Kujala Avatar answered Sep 26 '22 17:09

Jussi Kujala


import sys
reload(sys)
sys.setdefaultencoding('utf-8')

This works for me, I am setting the encoding upfront and it is valid throughout the script.

like image 38
swapnil shashank Avatar answered Sep 23 '22 17:09

swapnil shashank