Should I keep source files in memory while parsing?

Question

I'm writing the front-end part of an interpreter and I initially disliked the idea of just dumping all the source files into memory and then referencing that text directly. So the tokenizer reads from a char buffers and builds the token stream.

However, I have reached the parsing side of things and it hit me that because I would want to output nice errors and warnings that show the malformed line of source code. I guess I could put column numbers in the tokens, but then by error messages would be like getting directions by telephone: "It's in file X, on line Y, column Z, right next to the curly brace, you know the one. If you hit the semicolon, you've gone to far."

I seem to have put myself into a situation where I want to have my cake and eat it too. I want nice messages, but I don't want to hog memory.

It there something I'm missing? Or is loading the source in memory the way to go?

Ira Baxter · Accepted Answer

When there's an error to report to the user, it hardly matters how long in milliseconds it takes to report it.

I'd keep your tokenized stream in memory to keep your interpreter fast. (Actually, you should switch to a threaded interpreter or even a bad one pass compiler to enhance the execution rate).

When you encounter an error, go to the disk, fetch the line(s) of interest, and show them to the user. If he doesn't make any errors, this costs you zero. If he makes a small number of errors, that may be tiny bit inefficient but the user won't know. If he makes large number of errors, the file content of the files containing errors are going to read by the OS into its local cache, which is likely bigger than your programs anyway, and so access will be more efficient than if you kept the source entirely on the disk.

o11c · Answer

Better idea: mmap your sources in the first place, if you can. Fall back to slurping the whole file if you're reading from a pipe or something.

After parsing, you may want to call madvise(MADV_DONTNEED) (but only if it was originally mmaped) to advise the kernel to drop it from the cache (but still keep it available for errors) ... but this is probably not necessary, and may even not be a good idea, depending on your compiler design (e.g. are identifiers still pointing, or are they interned to a single, separate, allocation).

Should I keep source files in memory while parsing?

Tags:

c

parsing

that_individual

2 Answers

Ira Baxter

o11c

Recent Activity

Donate For Us

Should I keep source files in memory while parsing?

Tags:

c

parsing

that_individual

2 Answers

Ira Baxter

o11c

Related questions

Recent Activity

Donate For Us