I am planning to do a text editor in c. So just wanted to know what data structure is good to save the text. I read using linked list was one way to do it, but not efficient. Please point me to some references where I can get a good idea of what need to use. I am planning to use ncurses library for getting the user input and capturing keys and UI.
Using the source code of existing editors is kind of too complex, all the text editors are huge, even console only editors. Any simple console editor source code for reference?
You will benefit by reading about Emacs buffers. Also see this blog, especially the last comment, quoted here for easy reference:
Many versions of Emacs, including GNU, use a single contiguous character array virtually split in two sections separated by a gap. To insert the gap is first moved to the insertion point. Inserted characters fill into the gap, reducing its size. If there’s insufficient space to hold the characters the entire buffer is reallocated to a new larger size and the gaps coalesced at the previous insertion point.
The naive look at this and say the performance must be poor because of all the copying involved. Wrong. The copy operation is incredibly quick and can be optimized in a variety of ways. Gap buffers also take advantage of usage patterns. You might jump all over the window before focusing and inserting text. The gap doesn’t move for display – only for insert (or delete).
On the other hand, inserting a character block at the head of a 500MB file then inserting another at the end is the worst case for the gap approach, especially if the gap’s size is exceeded. How often does that happen?
Contiguous memory blocks are prized in virtual memory environments because less paging is involved. Moreover, reads and writes are simplfied because the the file doesn’t have to be parsed and broken up into some other data structure. Rather, the file’s internal representation in the gap buffer is identical to disk and can be read into and written out optimally. Writes themselves can be done with a single system call (on *nix).
The gap buffer is the best algorithm for editing text in a general way. It uses the least memory and has the highest aggregate performance over a variety of use cases. Translating the gap buffer to a visual window is a bit trickier as line context must be constantly maintained.
If you want it to scale, you should use a form of balanced binary tree. It's possible to make it so basically all operations - insert, delete, seek to character, seek to line, etc. - are O(log n)
. If you only care about "sane" file sizes for text (a few megs maximum) it doesn't really matter what structures you use.
You should "save" the data as plain text. If you mean how to store the data in memory, I recommend a simple linked list.
If it's just a text editor (not a word processor), the approach I took was to store each line in it's own link node.
This is a nice simple approach that makes it easy to insert and delete lines. And inserting or deleting text is efficient because only the data within the current node needs to be shifted around when inserting or deleting text.
You said you don't want to look at source code but, nonetheless, you can download the version I wrote many, many years ago at http://www.softcircuits.com/sw_dos.aspx by downloading pictor.zip to see a simple text editor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With