Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text file with 0D 0D 0A line breaks

A customer is sending me a .csv file where the line breaks are made up of the sequence 0xD 0xD 0xA. As far as I know line breaks are either 0xA from Mac or Unix or 0xD 0xA from Windows.

Is the 0xD 0xD 0xA any known encoding? Is there any known sequence of savings that corrupts a file's line endings that causes this (I think the customer uses a Mac)?

The file doesn't start with any encoding markers, it starts with the text contents directly. The text is displayed correctly if opened with code page 1252.

like image 828
Anders Abel Avatar asked Aug 09 '11 15:08

Anders Abel


People also ask

What is 0d 0a?

The hexadecimal 0d is called a carriage return. Pretty much all the programs on the Windows platform understand and expect the hexadecimal 0a0d pair in text. The 0d0a pair of characters is the signal for the end of a line and beginning of another.

How do I find line breaks in a text file?

Open any text file and click on the pilcrow (¶) button. Notepad++ will show all of the characters with newline characters in either the CR and LF format. If it is a Windows EOL encoded file, the newline characters of CR LF will appear (\r\n). If the file is UNIX or Mac EOL encoded, then it will only show LF (\n).

What are CR and LF characters?

CR = Carriage Return ( \r , 0x0D in hexadecimal, 13 in decimal) — moves the cursor to the beginning of the line without advancing to the next line. LF = Line Feed ( \n , 0x0A in hexadecimal, 10 in decimal) — moves the cursor down to the next line without returning to the beginning of the line.


1 Answers

The CRCRLF is known as result of a Windows XP notepad word wrap bug.

For future reference, here's an extract of relevance from the linked blog:

When you press the Enter key on Windows computers, two characters are actually stored: a carriage return (CR) and a line feed (LF). The operating system always interprets the character sequence CR LF the same way as the Enter key: it moves to the next line. However when there are extra CR or LF characters on their own, this can sometimes cause problems.

There is a bug in the Windows XP version of Notepad that can cause extra CR characters to be stored in the display window. The bug happens in the following situation:

If you have the word wrap option turned on and the display window contains long lines that wrap around, then saving the file causes Notepad to insert the characters CR CR LF at each wrap point in the display window, but not in the saved file.

The CR CR LF characters can cause oddities if you copy and paste them into other programs. They also prevent Notepad from properly re-wrapping the lines if you resize the Notepad window.

You can remove the CR CR LF characters by turning off the word wrap feature, then turning it back on if desired. However, the cursor is repositioned at the beginning of the display window when you do this.

like image 177
BalusC Avatar answered Sep 22 '22 11:09

BalusC