Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx help for chess moves (SAN)

Tags:

java

regex

I'm writing a program that should be able to read and parse chess moves (SAN).

Here's an example of possible accepted moves:

e4
Nf3
Nbd2
Nb1c3
R1a3
d8=Q
exd5
Nbxd2
...

I first wrote the NFA, then converted it to grammar and then I converted it to a regular expression.

With my conventions, this is how it looks

pln + plxln + plnxln + plnln + plln + pxln + lxln=(B+R+Q+N) + lxln + lnxln=(B+R+Q+N) + lnxln + lnln=(B+R+Q+N) + lnln + ln=(B+R+Q+N) + ln + pnxln + pnln

where:

p is a character of set {B,R,Q,N,K} (or think it as (B+R+Q+N+K) = [BRQNK]

l is a character among [a-h] interval (case sensitive)

n is a number among [1-8] interval

+ represents Union operation... if I got it right, (B+R+Q+N) is [BRQN] in regex's programming languages.

= is just a normal character... in chess moves it's used in promotion (ex. e8=Q)

x is a normal character too... used when by moving your piece in that location you're taking an opponent's one.

(/): Like in math

I tried to parse first part pln as: [BRQN][a-h][1-8] in an online java regex tester and worked for a move like Nf3. I didn't get well how to do the union thing for composite expression (like pln+plxln)... also how can I label to parts of regex so that when it's detected, I get all the infos? I tried to read docs about it but didn't figure out.

Any advice?

like image 797
BlackBox Avatar asked Oct 12 '16 20:10

BlackBox


1 Answers

The + in your notation is | in regexes. So you could use the regex

[BRQNK][a-h][1-8]|[BRQNK][a-h]x[a-h][1-8]|[BRQNK][a-h][1-8]x[a-h][1-8]|[BRQNK][a-h][1-8][a-h][1-8]|[BRQNK][a-h][a-h][1-8]|[BRQNK]x[a-h][1-8]|[a-h]x[a-h][1-8]=(B+R+Q+N)|[a-h]x[a-h][1-8]|[a-h][1-8]x[a-h][1-8]=(B+R+Q+N)|[a-h][1-8]x[a-h][1-8]|[a-h][1-8][a-h][1-8]=(B+R+Q+N)|[a-h][1-8][a-h][1-8]|[a-h][1-8]=(B+R+Q+N)|[a-h][1-8]|[BRQNK][1-8]x[a-h][1-8]|[BRQNK][1-8][a-h][1-8]

This is, clearly, a bit ugly. I can think of 2 possible ways to make it nicer:

  • With the COMMENTS flag, you can add whitespace.
  • Combine the possibilities together in a nicer way. For example, [BRQNK][a-h]x[a-h][1-8]|[BRQNK][a-h][1-8]x[a-h][1-8] can be rewritten as [BRQNK][a-h][1-8]?x[a-h][1-8].

I also know of one other improvement which isn't available in java. (And maybe not many languages, but you can do it in Perl.) The subexpression (?1) (likewise (?2), etc) is a bit like \1, except that instead of matching the exact string that matched the first capture group, it matches any string that could have matched that capture group. In other words, it's equivalent to writing the capture group out again. So you could (in Perl) replace the first [BRQNK] with ([BRQNK]), then replace all subsequent occurrences with (?1).

like image 109
David Knipe Avatar answered Sep 29 '22 01:09

David Knipe