Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Syntax Highlighting: How Does Eclipse Do It So Fast?

I've developed a syntax highlighter in Java for Android and it's working well, but the problem is it can be slow with big files.

So I'm wondering how source code editors like Eclipse and Gedit (Ubuntu) highlight what you've just wrote so quickly. For example, if you enter the ending greater than symbol when writing a HTML tag, it highlights the tag instantly.

How is it so quick, even with big files? Is there a specific way they go about doing it or do they just perform the syntax highlighting for the line you're on?

Thanks, Alex

like image 893
AlexPriceAP Avatar asked Aug 30 '11 12:08

AlexPriceAP


People also ask

Does Eclipse have syntax highlighting?

The color and font settings can be configured from the Syntax Coloring preferences page, accessed from Window | Preferences | PHP | Editor | Syntax Coloring. Note: Syntax Coloring will also be available for JavaScript elements if JavaScript support was enabled for the project.

What is the use of syntax highlighting?

Syntax highlighting determines the color and style of source code displayed in the Visual Studio Code editor. It is responsible for colorizing keywords like if or for in JavaScript differently than strings and comments and variable names.

How do I stop highlighting the selected variable in Eclipse?

Alt + Shift + O disables and enables this feature, which is called Mark Occurrences.


1 Answers

I cannot talk for Gedit, but in Eclipse, we cheat :-)

If you look very carefully, you can actually see that syntax coloring for structured languages like Java is a two-phase process.

First, a presentation reconciler is run to do very basic syntax coloring. This is done immediately triggered on changes in the document of the editor and is expected to be extremely fast. It is really not syntax-based coloring, but actually lexically-based coloring. So the focus is on tokens like strings, keywords, words, numbers, comments, etc - all tokens that can be recognized easily based on simple character tables or similar. Thus there are no difference between a class name, a variable name or a static method name, even though they may be colored different in the end. For many languages, this is the only coloring done.

Next, a syntax reconciler is run to build an abstract syntax tree (AST) for the document - or as near as you can get in the face of syntax errors or semantic errors. This is triggered by a timer and for some languages an attempt is made to just do a partial update of the AST (not easy). The completed AST is then used to update the outline view and then do additional syntax coloring based on the additional information - e.g. static method name. (The AST is often used for many other things, like hover information, folding, hyperlinking, etc.

Both for the initial presentation reconciler and the later syntax based reconciler, some rather elaborate logic determines just how big a region of the document that must be parsed. For the presentation reconciler the decision can be based on any existing coloring, whereas for the syntax based coloring a separate damage/repair phase in run to determine the size of the region.

Some extreme examples that always complicate matters are when block comments are added or removed

a = b /* c + 1 /* remember the offset! */;

If the first slash is removed or added, the presentation reconciler must process a larger area, than what can be naively expected...

like image 134
Tonny Madsen Avatar answered Oct 27 '22 03:10

Tonny Madsen