Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text editor theory [closed]

As I'm always dissatisfied with existing editors, a project I always wanted to start is my own text editor. However doing text editing is serious business.

Besides analyzing the source code of existing text editors, is there any book or other resource (like academic work) about this topic? I'm interested especially in something that teaches how to handle memory and how to manage text insertion (if you have a 100 MB file and want to add a char at x position, you can't just memmove the huge text block...).

like image 443
lornova Avatar asked Jul 02 '10 22:07

lornova


People also ask

What is text editor give example?

A text editor is a type of computer program that edits plain text. In Windows, such programs are sometimes known as "notepad" software, following the naming of Microsoft Notepad.

What are the two types of text editors?

Notepad, Wordpad are some of the common editors used on Windows OS and vi, emacs, Jed, pico are the editors on UNIX OS. Features normally associated with text editors are — moving the cursor, deleting, replacing, pasting, finding, finding and replacing, saving etc.

Is text editor a utility software?

Typical utilities include such programs as shells, text editors, compilers, and (sometimes) the file system.


11 Answers

Take a look at Rob Pike's description of his Sam text editor. Be sure to browse past the high-level overview and command language. He describes parsing, memory management, and data structures later in the paper.

In addition, take a look at Russ Cox's simple regular expression implementation. It's easy to follow and may open some doors outside existing regular expression libraries.

like image 134
Corbin March Avatar answered Sep 29 '22 11:09

Corbin March


Over the years I've written quite a number of different text editors. Certainly the simplest way is to manage a long sequence of characters, where you copy everything around to insert any character. Other techniques that I've used include:

  • Represent the text file as a doubly linked list of text lines.
  • Construct a tree-like data structure (sometimes called a "rope") which starts off as a solid string of characters, but has the ability to split, insert, and delete blocks of text without having to move all the rest of the text around.

Many of the old Borland example books used a text editor as a tutorial example. You can occasionally still find copies of these at used bookstores for nearly free.

like image 34
Greg Hewgill Avatar answered Sep 26 '22 11:09

Greg Hewgill


There's an excellent tutorial available here that covers a lot of relevant topics in a more modern context:

  • Design and Implementation of a Win32 Text Editor

The other answers to this question cover gap buffer.

Another modern coverage is the descriptions of AvalonEdit

  • http://www.codeproject.com/KB/edit/AvalonEdit.aspx

and extra detail from:

  • http://wiki.sharpdevelop.net/AvalonEdit.ashx
  • http://danielgrunwald.de/coding/AvalonEdit/document.php
  • http://danielgrunwald.de/coding/AvalonEdit/rendering.php

and there's a huge amount of detail/content (about SharpDevelop) in the book:

  • http://www.icsharpcode.net/opensource/sd/insidesharpdevelop.aspx
like image 24
StephenD Avatar answered Sep 30 '22 11:09

StephenD


Promoted to answer by request:

The antique "Software Tools in Pascal" by Kernighan and Plaugher implements the ed editor in a language with neither real strings nor pointers. It contains a great overview of the design considerations that underlie any text editor.

like image 39
msw Avatar answered Sep 26 '22 11:09

msw


One old method that still works is called a gap buffer. The basic idea is that you put the text into a buffer, but instead of putting it all in one block, you create a "gap" at the cursor, putting all the text before the cursor at the beginning of the buffer, and all the text after the cursor at the end of the buffer. Most insertions take place at the cursor, which you can do without moving anything (until or unless you overflow the buffer). When the user moves the cursor, you move the appropriate text from one side of the split to the other.

Given typical controls (cursor left, right, up, down, page up, page down), the largest move you typically deal with is a page at a time, which is typically easy to handle quite a bit faster than a keyboard repeats. It can, of course, slow down a bit if you have a truly huge file and a "goto line" command, or something on that order. If you're going to do a lot of that, there are undoubtedly better structures to use...

like image 30
Jerry Coffin Avatar answered Sep 28 '22 11:09

Jerry Coffin


The Craft of Text Editing by Craig Finseth, based on his master's thesis, covers these topics. It's free on the web. OTOH it's pretty old and doesn't mention some ideas like ropes that were less practical on the tiny computers of yesteryear.

like image 27
Darius Bacon Avatar answered Sep 26 '22 11:09

Darius Bacon


This paper compares many data structures that can be used for text editors, including some already mentioned here (e.g., gap buffers) as well as several others (e.g., piece tables). The article is old, but I believe it still covers all the major choices, and it nicely compares them in terms of performance, complexity, and space overhead.

I know Stack Overflow answers aren't just supposed to be links, but this is still the best source of information I've ever found for the information requested, and it's far too long to summarize in an answer here. In case the link goes stale, search for "Data Structures for Text Sequences" by Charles Crowley of the University of New Mexico.

like image 40
Adrian McCarthy Avatar answered Sep 30 '22 11:09

Adrian McCarthy


The Scintilla component uses a split buffer, along the theory explained in a text linked in their Scintilla and SciTE Related Sites page.
The linked page is Data Structures in a Bit-Mapped Text Editor.
The split buffer proved it works well even with megabyte files. Using secondary structures (eg. a list of line starts) can help too.

like image 43
PhiLho Avatar answered Sep 27 '22 11:09

PhiLho


Here's how "the pros" at Microsoft do it:

MS Word uses the piece table data structure. An ascending array of character positions is maintained in parallel with an array of larger structures that contain pointers into either the original file (text loaded on file load) or into an append-only buffer of newly added characters.

The EDIT control used everywhere in Windows uses no data structure at all. Notepad simply uses a multi-line EDIT control. Check it out with a heap viewer. The EDIT control maintains only a buffer of line-breaks and tab-stops.

If you're going to create a simple non-formatted text editor in Windows, you could easily subclass the EDIT control to add features.

like image 32
James Avatar answered Sep 29 '22 11:09

James


how to manage text insertion (if you have a 100 MB file and want to add a char at x position, you can't just memove the huge text block...).

Make the text file a linked list, where every line is an entry.

like image 25
Bart van Heukelom Avatar answered Sep 27 '22 11:09

Bart van Heukelom


Well if you know that in general people will have relatively few insert points, you could hold an array of pointers into your original text buffer and when the user tries to insert inside it, you "split" the buffer by spawning another pointer to the rest of the buffer, making the first pointer's length so it stops at the insertion point and adding a 3rd pointer for the text to be inserted inbetween, kinda like:

long original text la la la
^                *^
|                 2nd part
1st part

And the star points into a new text buffer where you start adding the text to be inserted.

When you render (or in general parse) your text file, you loop over the array of pointers and then do your work on each buffer. Of course if the buffer is small enough, you skip the adding a new buffer part, but that's just heuristics, try each and get a feel for which works best.

You could also consider splitting the text file on load into multiple buffers say every 1MB or so, because if you load the file in a single buffer you WILL need to make a new buffer for inserted text due to the size. Again, this is a heuristic optimization.

like image 25
Blindy Avatar answered Sep 27 '22 11:09

Blindy