I am using python to extract Arabic tweets from twitter and save it as a CSV file, but when I open the saved file in excel the Arabic language displays as symbols. However, inside python and notepad or word, it looks good. May I know where is the problem?
You can also format Arabic text in Excel. To do this, select the text you want to format and then click on the "Format" tab. Then, select "Arabic" from the "Font" drop-down menu.
This is a problem I face frequently with Microsoft Excel when opening CSV files that contain Arabic characters. Try the following workaround that I tested on latest versions of Microsoft Excel on both Windows and MacOS:
Open Excel on a blank workbook
Within the Data tab, click on From Text button (if not activated, make sure an empty cell is selected)
Browse and select the CSV file
In the Text Import Wizard, change the File_origin to "Unicode (UTF-8)"
Go next and from the Delimiters, select the delimiter used in your file e.g. comma
Finish and select where to import the data
The Arabic characters should show correctly.
Just use encoding='utf-8-sig' instead of encoding='utf-8' as follows:
import csv
data = u"اردو"
with(open('example.csv', 'w', encoding='utf-8-sig')) as fh:
writer = csv.writer(fh)
writer.writerow([data])
It worked on my machine.
The only solution that i've found to save arabic into an excel file from python is to use pandas and to save into the xlsx extension instead of csv, xlsx seems a million times better here's the code i've put together which worked for me
import pandas as pd
def turn_into_csv(data, csver):
ids = []
texts = []
for each in data:
texts.append(each["full_text"])
ids.append(str(each["id"]))
df = pd.DataFrame({'ID': ids, 'FULL_TEXT': texts})
writer = pd.ExcelWriter(csver + '.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', encoding="utf-8-sig")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With