Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing CSV with line breaks in Excel 2007

Excel (at least in Office 2007 on XP) can behave differently depending on whether a CSV file is imported by opening it from the File->Open menu or by double-clicking on the file in Explorer.

I have a CSV file that is in UTF-8 encoding and contains newlines in some cells. If I open this file from Excel's File->Open menu, the "import CSV" wizard pops up and the file cannot be correctly imported: the newlines start a new row even when quoted. If I open this file by double-clicking on it in an Explorer window, then it opens correctly without the intervention of the wizard.


None of the suggested solutions worked for me.

What actually works (with any encoding):

Copy/paste the data from the csv-file (open in a text editor), then perform "text to columns" --> data gets transformed incorrectly.

The next stap is to go to the nearest empty column or empty worksheet and copy/paste again (same thing what you already have in your clipboard) --> automagically works now.


If you are doing this manually, download LibreOffice and use LibreOffice Calc to import your CSV. It does a much better job of stuff like this than any version of Excel I've tried, and it can save to XLS or XLSX as required if you need to transfer to Excel afterwards.

But if you're stuck with Excel and need a better fix, there seems to be a way. It seems to be locale dependent (which seems idiotic, in my humble opinion). I don't have Excel 2007, but I have Excel 2010, and the example given:

ID,Name,Description
"12345","Smith, Joe","Hey.
My name is Joe."

doesn't work. I wrote it in Notepad and chose Save as..., and next to the Save button you can choose the encoding. I chose UTF-8 as suggested, but with no luck. Changing the commas to semicolons worked for me, though. I didn't change anything else, and it just worked. So I changed the example to look like this, and chose the UTF-8 encoding when saving in Notepad:

ID;Name;Description
"12345";"Smith, Joe";"Hey.
My name is Joe."

But there's a catch! The only way it works is if you double-click the CSV file to open it in Excel. If I try to import data from text and chose this CSV, then it still fails on quoted newlines.

But there's another catch! The working field separator (comma in the original example, semicolon in my case) seems to depend on the system's Regional Settings (set under Control Panel -> Region and Language). In Norway, comma is the decimal separator. Excel seems to avoid this character and prefer a semicolon instead. I have access to another computer set to UK English locale, and on that computer, the first example with a comma separator works fine (only on doubleclick), and the one with semicolon actually fails! So much for interoperability. If you want to publish this CSV online and users may have Excel, I guess you have to publish both versions and suggest that people check which file gives the correct number of rows.

So all the details that I've been able to gather to get this to work are:

  1. The file must be saved as UTF-8 with a BOM, which is what Notepad does when you chose UTF-8. I tried UTF-8 without BOM (can be switched easily in Notepad++), but then double-clicking the document fails.
  2. You must use a comma or a semicolon separator, but not the one that is the decimal separator in your Regional Settings. Perhaps other characters work, but I don't know which.
  3. You must quote fields that contain a newline with the " character.
  4. I've used Windows line-endings (\r\n) both in the text field and as a record separator, that works.
  5. You must double-click the file to open it, importing data from text doesn't work.

Hope this helps someone.


I have finally found the problem!

It turns out that we were writing the file using Unicode encoding, rather than ASCII or UTF-8. Changing the encoding on the FileStream seems to solve the problem.

Thanks everyone for all your suggestions!


Use Google Sheets and import the CSV file.

Then you can export that to use in Excel


Short Answer

Remove the newline/linefeed characters (\n with Notepad++). Excel will still recognise the carriage return character (\r) to separate records.

Long Answer

As mentioned newline characters are supported inside CSV fields but Excel doesn't always handle them gracefully. I faced a similar issue with a third party CSV that possibly had encoding issues but didn't improve with encoding changes.

What worked for me was removing all newline characters (\n). This has the effect of collapsing fields to a single record assuming that your records are separated by the combination of a carriage return and a newline (CR/LF). Excel will then properly import the file and recognise new records by the carriage return.

Obviously a cleaner solution is to first replace the real newlines (\r\n) with a temporary character combination, replacing the newlines (\n) with your seperating character of choice (e.g. comma in a semicolon file) and then replacing the temporary characters with proper newlines again.


If the field contains a leading space, Excel ignores the double quote as a text qualifier. The solution is to eliminate leading spaces between the comma (field separator) and double-quote. For example:

Broken:
Name,Title,Description
"John", "Mr.", "My detailed description"

Working:
Name,Title,Description
"John","Mr.","My detailed description"