Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does "cut and paste" affect character encoding and what can go wrong?

Tags:

I have a document A in encoding A displayed in tool A and a document B in encoding B displayed in tool B. If I cut and paste (part of) B into A what might be the resultant character encoding? I realise this depends on tool A and tool B and the information held in the paste buffer (which presumably can contain an encoding?) and the operating system.

What should high-quality tools do? and in practice how many of the common tools (e.g. Word, TextPad, various IDEs, etc.) do a good job?

like image 487
peter.murray.rust Avatar asked Dec 18 '09 18:12

peter.murray.rust


People also ask

What is the problem of encoding?

Problem. Computers store text as a sequence of numbers where each character has a unique number according to an agreed upon "character encoding standard". The problem is that there are many standards and each standard assigns different numbers to the same character.

What is the importance of character encoding?

A character encoding provides a key to unlock (ie. crack) the code. It is a set of mappings between the bytes in the computer and the characters in the character set. Without the key, the data looks like garbage.

What encoding is clipboard?

Clipboard holds a copy and then at a later time pastes it into the receiving app. Clipboard does not change character encoding or other attributes on the way by. Old special characters might cause problems either source or receiving app, so you should update these characters.

Do encoding systems conflict with one another?

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character.


1 Answers

First of all, a text editor's internal representation of text has no bearing on how the text is encoded (serialized) when you save the file. So a document is not "in" an encoding; it's a sequence of abstract characters. When the document is saved to a file (or transmitted over the network) then it gets encoded.

It's up to each application to decide what it puts on the clipboard. Typically, a windows app that knows what it's doing will put a number of different representations on the clipboard. When you paste in the other app, the app will look for the representation that best suits its need.

In your case, a text editor (that knows what it's doing) will put a Unicode representation of a selected string onto the clipboard (where Unicode, in Windows, is typically moved around as UTF-16, but that's not important). When you paste in the other app, it will insert that sequence of Unicode characters into the document at the selection point.

There's an app floating around called "ClipSpy" that will help you see what I'm talking about, interactively.

like image 85
Jonathan Feinberg Avatar answered Nov 03 '22 05:11

Jonathan Feinberg