Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vim - use regex to lexicographically compare strings (to find earlier/later dates)

Tags:

regex

vim

I want to write a simple regex, in vim, that will find all strings lexicographically smaller than another string.

Specifically, I want to use this to compare dates formatted as 2014-02-17. These dates are lexicographically sortable, which is why I use them.

My specific use case: I'm trying to run through a script and find all the dates that are earlier than today's today.

I'm also OK with comparing these as numbers, or any other solution.

like image 521
Edan Maor Avatar asked Feb 17 '14 19:02

Edan Maor


2 Answers

I don't think there is anyway to do this easily in regex. For matching any date earlier than the current date you can use run the function below (Some of the stuff was stolen from benjifisher)

function! Convert_to_char_class(cur) 
    if a:cur =~ '[2-9]'
        return '[0-' . (a:cur-1) . ']'
    endif
    return '0'
endfunction

function! Match_number_before(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[1-9]'
            call add(branches, init . Convert_to_char_class(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_before(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_before(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_before(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_before(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

To use you the following to search for all matches before 2014-02-24.

/<C-r>=Match_date_before('2014-02-24')

You might be able to wrap it in a function to set the search register if you wanted to.

The generated regex for dates before 2014-02-24 is the following.

\%(\%([0-1]\d\d\d\|200\d\|201[0-3]\)-\d\{2}-\d\{2}\|2014-\%(0[0-1]\)-\d\{2}\|2014-02-\%([0-1]\d\|2[0-3]\)\)

It does not do any validation of dates. It assumes if you are in that format you are a date.


Equivalent set of functions for matching after the passed in date.

function! Convert_to_char_class_after(cur) 
    if a:cur =~ '[0-7]'
        return '[' . (a:cur+1) . '-9]'
    endif
    return '9'
endfunction

function! Match_number_after(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[0-8]'
            call add(branches, init . Convert_to_char_class_after(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_after(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_after(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_after(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_after(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

The regex generated was

\%(\%([3-9]\d\d\d\|2[1-9]\d\d\|20[2-9]\d\|201[5-9]\)-\d\{2}-\d\{2}\|2014-\%([1-9]\d\|0[3-9]\)-\d\{2}\|2014-02-\%([3-9]\d\|2[5-9]\)\)
like image 180
FDinoff Avatar answered Nov 15 '22 04:11

FDinoff


You do not say how you want to use this; are you sure that you really want a regular expression? Perhaps you could get away with

if DateCmp(date, '2014-02-24') < 0
  " ...
endif

In that case, try this function.

" Compare formatted date strings:
" @param String date1, date2
"   dates in YYYY-MM-DD format, e.g. '2014-02-24'
" @return Integer
"   negative, zero, or positive according to date1 < date2, date1 == date2, or
"   date1 > date2
function! DateCmp(date1, date2)
  let [year1, month1, day1] = split(a:date1, '-')
  let [year2, month2, day2] = split(a:date2, '-')
  if year1 != year2
    return year1 - year2
  elseif month1 != month2
    return month1 - month2
  else
    return day1 - day2
  endif
endfun

If you really want a regular expression, then try this:

" Construct a pattern that matches a formatted date string if and only if the
" date is less than the input date.  Usage:
" :echo '2014-02-24' =~ DateLessRE('2014-03-12')
function! DateLessRE(date)
  let init = ''
  let branches = []
  for c in split(a:date, '\zs')
    if c =~ '[1-9]'
      call add(branches, init . '[0-' . (c-1) . ']')
    endif
    let init .= c
  endfor
  return '\d\d\d\d-\d\d-\d\d\&\%(' . join(branches, '\|') . '\)'
endfun

Does that count as a "simple" regex? One way to use it would be to type :g/ and then CRTL-R and = and then DateLessRE('2014-02-24') and Enter, followed by the rest of your command. In other words,

:g/<C-R>=DateLessRE('2014-02-24')<CR>/s/foo/bar

EDIT: I added a concat (:help /\&) that matches a complete "formatted date string". Now, there is no need to anchor the pattern.

like image 44
benjifisher Avatar answered Nov 15 '22 05:11

benjifisher