I'm writing a program that should be able to read and parse chess moves (SAN).
Here's an example of possible accepted moves:
e4
Nf3
Nbd2
Nb1c3
R1a3
d8=Q
exd5
Nbxd2
...
I first wrote the NFA, then converted it to grammar and then I converted it to a regular expression.
With my conventions, this is how it looks
pln + plxln + plnxln + plnln + plln + pxln + lxln=(B+R+Q+N) + lxln + lnxln=(B+R+Q+N) + lnxln + lnln=(B+R+Q+N) + lnln + ln=(B+R+Q+N) + ln + pnxln + pnln
where:
p
is a character of set {B,R,Q,N,K}
(or think it as (B+R+Q+N+K)
= [BRQNK]
l
is a character among [a-h]
interval (case sensitive)
n
is a number among [1-8]
interval
+
represents Union operation... if I got it right, (B+R+Q+N)
is [BRQN]
in regex's programming languages.
=
is just a normal character... in chess moves it's used in promotion (ex. e8=Q)
x
is a normal character too... used when by moving your piece in that location you're taking an opponent's one.
(
/)
: Like in math
I tried to parse first part pln
as: [BRQN][a-h][1-8]
in an online java regex tester and worked for a move like Nf3
. I didn't get well how to do the union thing for composite expression (like pln+plxln
)... also how can I label to parts of regex so that when it's detected, I get all the infos? I tried to read docs about it but didn't figure out.
Any advice?
The +
in your notation is |
in regexes. So you could use the regex
[BRQNK][a-h][1-8]|[BRQNK][a-h]x[a-h][1-8]|[BRQNK][a-h][1-8]x[a-h][1-8]|[BRQNK][a-h][1-8][a-h][1-8]|[BRQNK][a-h][a-h][1-8]|[BRQNK]x[a-h][1-8]|[a-h]x[a-h][1-8]=(B+R+Q+N)|[a-h]x[a-h][1-8]|[a-h][1-8]x[a-h][1-8]=(B+R+Q+N)|[a-h][1-8]x[a-h][1-8]|[a-h][1-8][a-h][1-8]=(B+R+Q+N)|[a-h][1-8][a-h][1-8]|[a-h][1-8]=(B+R+Q+N)|[a-h][1-8]|[BRQNK][1-8]x[a-h][1-8]|[BRQNK][1-8][a-h][1-8]
This is, clearly, a bit ugly. I can think of 2 possible ways to make it nicer:
COMMENTS
flag, you can add whitespace. [BRQNK][a-h]x[a-h][1-8]|[BRQNK][a-h][1-8]x[a-h][1-8]
can be rewritten as [BRQNK][a-h][1-8]?x[a-h][1-8]
.I also know of one other improvement which isn't available in java. (And maybe not many languages, but you can do it in Perl.) The subexpression (?1)
(likewise (?2)
, etc) is a bit like \1
, except that instead of matching the exact string that matched the first capture group, it matches any string that could have matched that capture group. In other words, it's equivalent to writing the capture group out again. So you could (in Perl) replace the first [BRQNK]
with ([BRQNK])
, then replace all subsequent occurrences with (?1)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With