Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write parser for unified diff syntax

Should I use RegexParsers, StandardTokenParsers or are these suitable at all for parsing this kind of syntax? Example of the syntax can be found from here.

like image 313
JtR Avatar asked Dec 13 '22 20:12

JtR


2 Answers

I'd use regex. It simplifies a few things, and makes the rest standard.

def process(src: scala.io.Source) {
  import scala.util.matching.Regex

  val FilePattern = """(.*) ''(.*)''"""
  val OriginalFile = new Regex("--- "+FilePattern, "path", "timestamp")
  val NewFile = new Regex("+++ "+FilePattern, "path", "timestamp")
  val Chunk = new Regex("""@@ -(\d+),(\d+) +(\d+),(\d+) @@""", "orgStarting", "orgSize", "newStarting", "newSize")
  val AddedLine = """+(.*)""".r
  val RemovedLine = """-(.*)""".r
  val UnchangedLine = """ (.*)""".r

  src.getLines() foreach {
    case OriginalFile(path, timestamp) => println("Original file: "+path)
    case NewFile(path, timestamp) => println("New file: "+path)
    case Chunk(l1, s1, l2, s2) => println("Modifying %d lines at line %d, to %d lines at %d" format (s1, l1, s2, l2))
    case AddedLine(line) => println("Adding line "+line)
    case RemovedLine(line) => println("Removing line "+line)
    case UnchangedLine(line) => println("Keeping line "+line)
  }
}
like image 82
Daniel C. Sobral Avatar answered Jan 09 '23 03:01

Daniel C. Sobral


This format was designed to be easy to parse, you can do it without any regular expressions and without tokenizing your input. Just go line by line and look at the first couple of characters. The file header and chunks headers will require a little more attention, but it's nothing you can't do with split.

Of course, if you want to learn how to use some parsing libraries, then go for it.

like image 42
Radomir Dopieralski Avatar answered Jan 09 '23 01:01

Radomir Dopieralski