Python Writing Weird Unicode to CSV

Question

I'm attempting to extract article information using the python newspaper3k package and then write to a CSV file. While the info is downloaded correctly, I'm having issues with the output to CSV. I don't think I fully understand unicode, despite my efforts to read about it.

from newspaper import Article, Source
import csv

first_article = Article(url="http://www.bloomberg.com/news/articles/2016-09-07/asian-stock-futures-deviate-as-s-p-500-ends-flat-crude-tops-46")

first_article.download()
if first_article.is_downloaded:
    first_article.parse()
    first_article.nlp

article_array = []
collate = {}

collate['title'] = first_article.title
collate['content'] = first_article.text
collate['keywords'] = first_article.keywords
collate['url'] = first_article.url
collate['summary'] = first_article.summary
print(collate['content'])
article_array.append(collate)

keys = article_array[0].keys()
with open('bloombergtest.csv', 'w') as output_file:
    csv_writer = csv.DictWriter(output_file, keys)
    csv_writer.writeheader()
    csv_writer.writerows(article_array)

output_file.close()

When I print collate['content'], which is first_article.text, the console outputs the article's content just fine. Everything shows up correctly, apostrophes and all. When I write to the CVS, the content cell text has odd characters in it. For example:

â€œAt the end of the day, Europeâ€™s economy isnâ€™t in great shape, inflation doesnâ€™t look exciting and there are a bunch of political risks to reckon with.

So far I have tried:

with open('bloombergtest.csv', 'w', encoding='utf-8') as output_file:

to no avail. I also tried utf-16 instead of 8, but that just resulted in the cells writing in an odd order. It didn't create the cells correctly in the CSV, although the output looked correct. I've also tried .encode('utf-8') are various variable but nothing has worked.

What's going on? Why would the console print the text correctly, while the CSV file has odd characters? How can I fix this?

Mark Tolonen · Accepted Answer

Add encoding='utf-8-sig' to open(). Excel requires the UTF-8-encoded BOM code point (Byte Order Mark, U+FEFF) signature to interpret a file as UTF-8; otherwise, it assumes the default localized encoding.

Python Writing Weird Unicode to CSV

Tags:

python

csv

unicode

sirryankennedy

1 Answers

Mark Tolonen

Recent Activity

Donate For Us

Python Writing Weird Unicode to CSV

Tags:

python

csv

unicode

sirryankennedy

1 Answers

Mark Tolonen

Related questions

Recent Activity

Donate For Us