I have the following line of Python code:
trans = data.map(lambda line: line.strip().split())
That produces Unicode strings , for example :
u'Hello',u'word'
I'd like to get normal UTF-8 or ASCII strings
'Hello','word'
I tried to convert string to UTF-8 such as
trans = data.map(lambda line: line.strip().split().encode("utf-8"))
or
trans = data.map(lambda line: line.strip().split().encode('ascii','ignore'))
But that gives an error :
AttributeError: 'list' object has no attribute 'encode'
Can anybody tell me how I can do this?
UPDATE :
data is scv file , trans is RDD
Why not simply encode and split:
data = sc.textFile("README.md")
trans = data.map(lambda x: x.encode("ascii", "ignore").split())
trans.first()
## ['#', 'Apache', 'Spark']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With