Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bengali-language text not displayed in Unicode CSV file

I have an Excel file in the Bengali language. To display the Bengali text properly I need Bengali fonts installed on the PC.

I converted the Excel file into CSV using Office 2010. But it only shows '?' marks instead of the Bengali characters. Then I used the Google Docs for the conversion, with the same problem, but with unreadable characters rather than '?'s. I pasted extracts from that file in an HTML file and tried to view it in my browser unsuccesfully.

What should I do to get a CSV file from an .xlsx file in Bengali so that I can import that into a MySQL database?

Edit: The answer accepted in this SO question made me go to Google Docs.

like image 205
Istiaque Ahmed Avatar asked Jun 20 '12 09:06

Istiaque Ahmed


1 Answers

According to the answers to the question Excel to CSV with UTF8 encoding, Google Docs should save CSV properly, contrary to Excel, which destroys all characters that are not representable in the “ANSI” encoding being used. But maybe they changed this, or something wrong, or the analysis of the situation is incorrect.

For properly encoded Bangla (Bengali) processed in MS Office programs, there should be no need for any “Bangla fonts”, since the Arial Unicode MS font (shipped with Office) contains the Bangla characters. So is the data actually in some nonstandard encoding that relies on a specially encoded font? In that case, it should first be converted to Unicode, though possibly it can be somehow managed using programs that consistently use that specific font.

In Excel, when using Save As, you can select “Unicode text (*.txt)”. It saves the data as TSV (tab-separated values) in UTF-16 encoding. You may then need to convert it to use comma as separator instead of tab, and/or from UTF-16 to UTF-8. But this only works if the original data is properly encoded.

like image 76
Jukka K. Korpela Avatar answered Sep 19 '22 07:09

Jukka K. Korpela