Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

emacs syntax highlight numbers not part of words (with regex?)

I've moved to emacs recently and I am used to/like numbers being highlighted. A quick hack I took from here puts the following in my .emacs:

(add-hook 'after-change-major-mode-hook
      '(lambda () (font-lock-add-keywords 
                   nil 
                   '(("\\([0-9]+\\)" 
                      1 font-lock-warning-face prepend)))))

Which gives a good start, i.e. any digit is highlighted. However, I am a complete beginner with regex and would ideally like the following behaviour:

  • Also highlight the decimal point if it's part of a float, e.g. 12.34
  • Do not highlight any part of the number if it is next/part of a word. e.g. in these cases: foo11 ba11r 11spam, none of the '1's should be highlighted
  • Allow 'e' within two number integers to allow scientific notation (not required, bonus credit)

Unfortunately this looks very much like a 'do this for me' question which I am loathe to post, but I have failed thus far to make any decent progress myself.

About as far as I have got is discovering [^a-zA-Z][0-9]+[^a-zA-Z] to match anything but a letter either side (e.g. an equals sign), but all this does is include the adjacent symbol in the highlighting. I am not sure how to tell it 'only highlight the numbers if there isn't a letter on either side'.

Of course, I can't imagine regex is the way to go with complicated syntax highlighting, so any good number highlighting in emacs ideas are also welcome,

Any help very much appreciated. (In case it makes any difference, this is for use when Python coding.)

like image 457
Jdog Avatar asked Dec 20 '22 09:12

Jdog


1 Answers

Start by going to your scratch buffers and typing in a some test text. put some numbers in there, some identifiers that contain numbers, some numbers with missing parts (like .e12), etc. These will be our testcases and will let us experiment rapidly. Now run M-x re-builder to enter the regex builder mode, which will let you try out any regex against the text of the current buffer to see what it matches. This is a very handy mode; you'll be able to use it all the time. Just note that because Emacs lisp requires you to put regexes into strings, you must double up on all of your backslashes. You're already doing that correctly, but I'm not going to double them up in here.

So, limiting the match to numbers that are not part of identifiers is pretty easy. \b will match word boundaries, so putting one at either end of your regex will make it match a whole word

You can match floats just by adding a period to the character class you started with, so that it becomes [0-9.]. Unfortunately, that can match a period all on it's own; what we really want is [0-9]*\.?[0-9]+, which will match 0 or more digits followed by an optional period followed by one or more digits.

A leading sign can be matched with [-+]?, so that gets us negative numbers.

To match exponents we need an optional group: \(...\)?, and since we are only using this for highlighting, and don't actually need to separate out the content of the group, we can do \(?:...\), which will save the regex matcher a little time. Inside the group we will need to match an "e" ([eE]), an optional sign ([-+]?), and one or more digits ([0-9]+).

Putting it all together: [-+]?\b[0-9]*\.?[0-9]+\(?:[eE][-+]?[0-9]+\)?\b. Note that I've put the optional sign before the first word boundary, because the "+" and "-" characters create a word boundary.

like image 68
db48x Avatar answered Mar 23 '23 00:03

db48x