I want to display whitespace characters while debugging or editing text by replacing them with sensible Unicode code points and colouring them grey instead of black.
For example, I would like to replace
SPACE
U+0020
with MIDDLE DOT
·U+00B7
NO-BREAK SPACE
U+00A0
with MEDIUM SMALL WHITE CIRCLE
⚬U+26AC
RIGHTWARDS ARROW
→U+2192
for TAB U+0009
.I'm looking for sensible glyphs for:
CARRIAGE RETURN U+000D
newline/LINE FEED U+000A
. I don't want to use the PILCROW SIGN
¶U+00B6
as it doesn't intuitively correspond to either but rather the concept of a new paragraph. There is also DOWNWARDS ARROW WITH CORNER LEFTWARDS
↵U+21B5
but again, it seem more like a combination symbol than representing either one individually.
When I have mixed line endings I want to be able to see which character is being used (or both). I am displaying the output in HTML in a browser.
Currently I can't think of any better symbols than:
- LEFTWARDS ARROW
←U+2190
for carriage return
- DOWNWARDS ARROW
↓U+2193
for newline
.
I am aware of SYMBOL FOR CARRIAGE RETURN
␍U+240D
, SYMBOL FOR LINE FEED
␊U+240A
and SYMBOL FOR NEWLINE
U+2424
but the detail is hard to see on them.
I also don't want to use \r
and \n
for two reasons, r
and n
look a little similar (not much, but a little) and it takes two characters to display them instead of one. However, if I don't get any better suggestions I might alternatively use DOWNWARDS ARROW WITH CORNER LEFTWARDS
↵U+21B5
for carriage return and RIGHTWARDS ARROW WITH CORNER DOWNWARDS
↴U+21B4
for newline.
As you've said, U+21B5
(↵) is a good choice for carriage return. Note that it is the symbol on your enter key, which has been in use for this since the days of electric typewriters. This is also where the name comes from, since it would literally return the carriage holding the paper and moving it under the ink ribbon head. As such I think it has become ingrained enough in users of keyboards to be intuitively recognizable.
Since you've noted concerns regarding visibility, however, consider U+23CE
(⏎). This symbol is part of the UNICODE standard for the express purpose of representing a return; but it might be interpreted as meaning a new line in general, which is often a combination of a carriage return and line feed (depending on the system).
U+21B5
(↵) is part of the UNICODE arrows block, while U+23CE
(⏎) is part of the "miscellaneous technical" block. That second one is closer to what seems useful for technical considerations like yours, rather than a regular arrow.
That leaves us with the line feed. When you start to think about what it actually is, even the choice for the return arrow becomes questionable. A line feed is basically an instruction for moving down a line. A carriage return simply moves the caret ("carriage") back to the start of a line. A line feed doesn't have to be combined with a carriage return, nor does a carriage return actually have to be combined with a line feed (although it is normally senseless not to). On typewriters this starts making sense. After typing a line you would swing the carriage back to the start, then scroll the paper upwards. Basically a carriage return + line feed. Now you see why "new line" might make sense as a combination of these two for historical purposes, and why they can be used in either order. Technically you can do a line feed without carriage return and continue typing in the column where you left off at the previous line. The reason this brings our ↵/⏎ into question is that the symbol seems to imply a carriage return AND line feed. Indeed, on electrical typewriters and word processors it normally results in a full new line.
So, how to represent line feed? An arrow pointing down seems like the intuitive choice, but then we might need to rethink our carriage return as well. U+21E9
(downwards white arrow, ⇩) is visually (likely, given that glyphs may vary) the most congruent with ⏎. But if we're going with that, you might as well use U+21E6
(leftwards white arrow, ⇦) for your carriage return.
What to choose with so much options? Well, personally I think the choice that is technically superior are the characters from the UNICODE "control pictures" block. These are the U+240A
(␊) for line feed, and U+240D
(␍) for carriage return. They also appeal to the programmer in me because the last byte of the code point for either corresponds to the ASCII code for them. But I understand that they can be hard to make out on screen and usability may be more important. But lots of text editors go with some variation of this when asked to show all symbols.
So I'd say the options are...
Also make sure you pick something that is likely to be properly shown in the majority of browsers, with the varying default fonts on various browsers and systems. I noticed some of the code points for supplemental blocks didn't show up when I went through the UTF-8 table.
Finally, one remark. Is it necessary to use UNICODE symbols? Notepad++, my favourite text editor, uses big "CR" and "LF" symbols on a gray background when all symbols are visualized. Perhaps you can simply use images (preferably scaled according to the font size in your CSS)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With