Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excel xlsx file saving as CSV file - Korean and Japanese cracking badly

Tags:

csv

excel

I am trying to make a CSV file from an Excel file. It has English, Korean and Japanese inputs. Right now it's saved as file.xlsx.

But when I try to save-as CSV through Excel as file.csv, all the Korean and Japanese inputs turn into question marks (???????)

I tried importing into Google Spreadsheets and exporting out as csv from there (from reading some other solutions) but it still turns into question marks.

I tried building a CSV file from scratch and just copying/pasting values from the Excel file into the CSV, but after I save it as CSV, the characters always crack.

Does anybody know how to work-around this? Thank you

like image 777
Terry Bu Avatar asked Jun 06 '14 17:06

Terry Bu


4 Answers

I don't know that there IS an answer for this. CSV has no encoding, so it gets lost when you save in that format.

I tried, as a test, saving Chinese characters as a Unicode Text file, and believe it or not, that worked. So you may be able to do that, and simply change the filename to CSV. Assuming for some reason you NEED the filename to be CSV.

EDIT: I just ran addional testing on this. I was able to reimport the TXT file with either TXT or CSV extension, and the characters stayed just fine. So I think Unicode text is your answer.

like image 70
durbnpoisn Avatar answered Oct 24 '22 03:10

durbnpoisn


To fully retain the characters while saving it on a CSV format and to somehow be able to import/re-use the data in the future.

You can follow these steps.

  1. In Microsoft Excel, open the *.xlsx file.
  2. Select Menu | Save As.
  3. Enter any name for your file.
  4. Under "Save as type," select Unicode Text.
  5. Click Save.
  6. Open your saved file in Microsoft Notepad.
  7. Replace all tab characters with commas (",").
    • Select a tab character (select and copy the space between two column headers)
    • Open the "Find and Replace" window (Press Ctrl+H) and replace all tab characters with comma .
  8. Click Save As.
  9. Name the file, and change the Encoding: to UTF-8.
  10. Change the file extension from .txt to .csv.
  11. Click Save.
  12. Open the .csv file in Excel to view your data.
like image 26
Kent Aguilar Avatar answered Oct 24 '22 01:10

Kent Aguilar


Simply opening a CSV file in Excel only works when default assumptions hold. You may be writing the CSV correctly but not validating it properly.

It is more reliable to open a blank worksheet and then use Data Import. The encoding of the CSV file is one of the parameters you can specify.

like image 1
Pete Forman Avatar answered Oct 24 '22 01:10

Pete Forman


Had the same issue. the below article shows the workaround in details: https://help.salesforce.com/articleView?id=000003837&type=1

However, i decided to go with LibreOffice Calc, as it requires less steps to achieve the desired outcome. While exporting, you get to select charecter set, field delimiter and text decimeter.

For all other tasks, i prefer Excel.

like image 1
user7997733 Avatar answered Oct 24 '22 02:10

user7997733