Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are text editors slow when editing very long lines?

Most text editors are slow when lines are very long. The suggested structure for data storage for text editor seems to be rope, which should be immune to long lines modification. By the way editors are even slow when simply navigating within long lines.

Example : A single character like 0 repeated 100000 times in PSPad or 1000000 times in Vim on a single line slow the cursor moves when you are at the end of the line. If there is as much bytes in the file but dispatched on multiple lines the cursor is not slowed down at all so I suppose it's not a memory issue.

What's the origin of that issue that is so common ?

I'm mostly using Windows, so may be this is something related to Windows font handling ?

like image 944
Emmanuel Caradec Avatar asked Sep 12 '11 13:09

Emmanuel Caradec


People also ask

What is a text editor buffer?

A buffer is the basic unit of text being edited. It can be any size, from zero characters to the largest item that can be manipulated on the computer system. This limit on size is usually set by such factors as address space, amount of real and/or virtual memory, and mass storage capacity.


2 Answers

You're probably using a variable-length encoding like utf8. The editor wants to keep track of what column you're in with every cursor movement, and with a variable-length encoding there is no shortcut to scanning every byte to see how many characters there are; with a long line that's a lot of scanning.

I suspect that you will not see such a slowdown with long lines using a single-byte encoding like iso8859-1 (latin1). If you use a single-byte encoding then character length = byte length and the column can be calculated quickly with simple pointer arithmetic. A fixed-length multibyte encoding like ucs-2 should be able to use the same shortcut (just dividing by the constant character size) but the editors might not be smart enough to take advantage of that.

like image 129
evil otto Avatar answered Oct 16 '22 13:10

evil otto


As evil otto suggested, line encoding can force the line to be re-parse and for long lines this causes all sorts of performance issues.

But it is not only encoding that causes the line to be re-parsed.

Tab characters also require a full line scan, since you need to parse the whole line in order to calculate the true cursor location.

Certain syntax highlighting definitions (i.e. block comments, quoted strings etc) also require a full line parse.

like image 42
jussij Avatar answered Oct 16 '22 14:10

jussij