Text editor theory [closed]

Tags:

As I'm always dissatisfied with existing editors, a project I always wanted to start is my own text editor. However doing text editing is serious business.

Besides analyzing the source code of existing text editors, is there any book or other resource (like academic work) about this topic? I'm interested especially in something that teaches how to handle memory and how to manage text insertion (if you have a 100 MB file and want to add a char at x position, you can't just memmove the huge text block...).

443

asked Jul 02 '10 22:07

lornova

11 Answers

Take a look at Rob Pike's description of his Sam text editor. Be sure to browse past the high-level overview and command language. He describes parsing, memory management, and data structures later in the paper.

In addition, take a look at Russ Cox's simple regular expression implementation. It's easy to follow and may open some doors outside existing regular expression libraries.

134

answered Sep 29 '22 11:09

Corbin March

Over the years I've written quite a number of different text editors. Certainly the simplest way is to manage a long sequence of characters, where you copy everything around to insert any character. Other techniques that I've used include:

Represent the text file as a doubly linked list of text lines.
Construct a tree-like data structure (sometimes called a "rope") which starts off as a solid string of characters, but has the ability to split, insert, and delete blocks of text without having to move all the rest of the text around.

Many of the old Borland example books used a text editor as a tutorial example. You can occasionally still find copies of these at used bookstores for nearly free.

answered Sep 26 '22 11:09

Greg Hewgill

There's an excellent tutorial available here that covers a lot of relevant topics in a more modern context:

Design and Implementation of a Win32 Text Editor

The other answers to this question cover gap buffer.

Another modern coverage is the descriptions of AvalonEdit

http://www.codeproject.com/KB/edit/AvalonEdit.aspx

and extra detail from:

http://wiki.sharpdevelop.net/AvalonEdit.ashx
http://danielgrunwald.de/coding/AvalonEdit/document.php
http://danielgrunwald.de/coding/AvalonEdit/rendering.php

and there's a huge amount of detail/content (about SharpDevelop) in the book:

http://www.icsharpcode.net/opensource/sd/insidesharpdevelop.aspx

answered Sep 30 '22 11:09

StephenD

Promoted to answer by request:

The antique "Software Tools in Pascal" by Kernighan and Plaugher implements the ed editor in a language with neither real strings nor pointers. It contains a great overview of the design considerations that underlie any text editor.

answered Sep 26 '22 11:09

msw

One old method that still works is called a gap buffer. The basic idea is that you put the text into a buffer, but instead of putting it all in one block, you create a "gap" at the cursor, putting all the text before the cursor at the beginning of the buffer, and all the text after the cursor at the end of the buffer. Most insertions take place at the cursor, which you can do without moving anything (until or unless you overflow the buffer). When the user moves the cursor, you move the appropriate text from one side of the split to the other.

Given typical controls (cursor left, right, up, down, page up, page down), the largest move you typically deal with is a page at a time, which is typically easy to handle quite a bit faster than a keyboard repeats. It can, of course, slow down a bit if you have a truly huge file and a "goto line" command, or something on that order. If you're going to do a lot of that, there are undoubtedly better structures to use...

answered Sep 28 '22 11:09

Jerry Coffin

The Craft of Text Editing by Craig Finseth, based on his master's thesis, covers these topics. It's free on the web. OTOH it's pretty old and doesn't mention some ideas like ropes that were less practical on the tiny computers of yesteryear.

answered Sep 26 '22 11:09

Darius Bacon

This paper compares many data structures that can be used for text editors, including some already mentioned here (e.g., gap buffers) as well as several others (e.g., piece tables). The article is old, but I believe it still covers all the major choices, and it nicely compares them in terms of performance, complexity, and space overhead.

I know Stack Overflow answers aren't just supposed to be links, but this is still the best source of information I've ever found for the information requested, and it's far too long to summarize in an answer here. In case the link goes stale, search for "Data Structures for Text Sequences" by Charles Crowley of the University of New Mexico.

answered Sep 30 '22 11:09

Adrian McCarthy

The Scintilla component uses a split buffer, along the theory explained in a text linked in their Scintilla and SciTE Related Sites page.
The linked page is Data Structures in a Bit-Mapped Text Editor.
The split buffer proved it works well even with megabyte files. Using secondary structures (eg. a list of line starts) can help too.

answered Sep 27 '22 11:09

PhiLho

Here's how "the pros" at Microsoft do it:

MS Word uses the piece table data structure. An ascending array of character positions is maintained in parallel with an array of larger structures that contain pointers into either the original file (text loaded on file load) or into an append-only buffer of newly added characters.

The EDIT control used everywhere in Windows uses no data structure at all. Notepad simply uses a multi-line EDIT control. Check it out with a heap viewer. The EDIT control maintains only a buffer of line-breaks and tab-stops.

If you're going to create a simple non-formatted text editor in Windows, you could easily subclass the EDIT control to add features.

answered Sep 29 '22 11:09

James

how to manage text insertion (if you have a 100 MB file and want to add a char at x position, you can't just memove the huge text block...).

Make the text file a linked list, where every line is an entry.

answered Sep 27 '22 11:09

Bart van Heukelom

Well if you know that in general people will have relatively few insert points, you could hold an array of pointers into your original text buffer and when the user tries to insert inside it, you "split" the buffer by spawning another pointer to the rest of the buffer, making the first pointer's length so it stops at the insertion point and adding a 3rd pointer for the text to be inserted inbetween, kinda like:

Click to copy

long original text la la la
^                *^
|                 2nd part
1st part

And the star points into a new text buffer where you start adding the text to be inserted.

When you render (or in general parse) your text file, you loop over the array of pointers and then do your work on each buffer. Of course if the buffer is small enough, you skip the adding a new buffer part, but that's just heuristics, try each and get a feel for which works best.

You could also consider splitting the text file on load into multiple buffers say every 1MB or so, because if you load the file in a single buffer you WILL need to make a new buffer for inserted text due to the size. Again, this is a heuristic optimization.

answered Sep 27 '22 11:09

Blindy

Related questions
                            
                                AVL tree vs. B-tree
                            
                                What is an "internal node" in a binary search tree?
                            
                                Red Black Tree versus B Tree
                            
                                Implementing -hash / -isEqual: / -isEqualTo...: for Objective-C collections
                            
                                find the max difference between j and i indices such that j > i and a[j] > a[i] in O(n)
                            
                                Which data structures and algorithms book should I buy? [closed]
                            
                                What exactly is augmenting path?
                            
                                Strategy to find your best route via Public Transportation only?
                            
                                Finding anagrams for a given word
                            
                                Suffix Array Algorithm
                            
                                What additional rotation is required for deletion from a Top-Down 2-3-4 Left-leaning Red Black tree?
                            
                                C struct inheritance pointer alignment
                            
                                Representing A Tree in Clojure
                            
                                Good STL-like library for C [closed]
                            
                                Queue vs Dequeue in java [closed]
                            
                                Circular buffer in JavaScript
                            
                                Is there pointer in C# like C++? Is it safe?
                            
                                How to initialize a vector with fixed length in R
                            
                                Hash table vs Balanced binary tree [closed]
                            
                                Why hashmap lookup is O(1) i.e. constant time?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Text editor theory [closed]

Tags:

data-structures

editor

text-editor

theory

lornova

People also ask

11 Answers

Corbin March

Greg Hewgill

StephenD

msw

Jerry Coffin

Darius Bacon

Adrian McCarthy

PhiLho

James

Bart van Heukelom

Blindy

Recent Activity

Donate For Us