Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple CSV lexer

I want to color CSV files with pygments by column like here:

enter image description here

See that same column is colored with the same color.

Currently pygments doesn't include CSV parser because CSV is said to be obscure format. So I tried to write a minimal one myself. Here's what I tried:

tokens = {
    'root': [
        (r'^[^,\n]+', Name.Function), # first column
        (',', Comment),               # separator
        (r'[^,\n]+', Name.Decorator), # second column
        (',', Comment),               # separator
        (r'[^,\n]+', Name.Constant),  # third column
        (',', Comment),               # separator
    ],
}

But it fails to color any column, but first:

enter image description here

As far as I know, pygments works by trying to match regexps one-by-one: when current regexp doesn't match -- it goes to the next one, and then all over again. If nothing matches it issues an error and advances one character (and puts that one in red box). For advanced cases like nested comments there are states, but I think for CSV one state might be sufficient.

Then I tried:

tokens = {
    'root': [
        (',', Comment),                           # separator
        (r'^[^,\n]+', Name.Function),             # first column
        (r'(?:^[^,\n]+)[^,\n]+', Name.Decorator), # second column
    ],
}

But it colors all column as second one:

enter image description here

Here's a sample data:

account_id,parent_account_id,name,status
,A001,English,active
A001,,Humanities,active
A003,A001,,active
A004,A002,Spanish,

In Emacs I managed to get what I wanted with:

(add-hook 'csv-mode-hook
             (lambda ()
               "colors first 8 csv columns differently"
               (font-lock-add-keywords nil '(("^\\([^,\n]*\\),"
                                              1 'font-lock-function-name-face)))
               (font-lock-add-keywords nil '(("^\\([^,\n]*\\),\\([^,\n]*\\)"
                                              2 'font-lock-variable-name-face)))
               (font-lock-add-keywords nil '(("^\\([^,\n]*\\),\\([^,\n]*\\),\\([^,\n]*\\)"
                                              3 'font-lock-keyword-face)))
               (font-lock-add-keywords nil '(("^\\([^,\n]*\\),\\([^,\n]*\\),\\([^,\n]*\\),\\([^,\n]*\\)"
                                              4 'font-lock-type-face)))
))

(I actually added more than 4 columns, but that is not important)

Which gives:

enter image description here

like image 953
Adobe Avatar asked Aug 26 '14 09:08

Adobe


1 Answers

Oh I solved it using states:

tokens = {
    'root': [
        (r'^[^,\n]*', Name.Function, 'second'),
    ],
    'second': [
        (r'(,)([^,\n]*)', bygroups(Comment, Name.Decorator), 'third'),
    ],
    'third': [
        (r'(,)([^,\n]*)', bygroups(Comment, Name.Constant), 'fourth'),
    ],
    'fourth': [
        (r'(,)([^,\n]*)', bygroups(Comment, Name.Variable), 'fifth'),
    ],
    'fifth': [
        (r'(,)([^,\n]*)', bygroups(Comment, Keyword.Type), 'unsupported'),
    ],
    'unsupported': [
        (r'.+', Comment),
        ],
}

It colors first 5 CSV columns differently, and all the others as Comments:

enter image description here

like image 198
Adobe Avatar answered Nov 05 '22 03:11

Adobe