NOTE: my target concern is C# targeting the CLR with regular MSIL in case there's something that works for that but not in the more general case(s).
There was recently a release of the Sourcepack project which allows a user to rewrite the source paths in a pdb file to point at different locations. This is very useful when you have the source for the assembly, but don't want to try and get it into the exact same filesystem location(s) as when it was built.
http://lowleveldesign.wordpress.com/2011/08/26/sourcepack-released/
For open-source projects, using http://www.symbolsource.org/ as a way of making it simple for users of your project to get symbols and source is an excellent idea.
However, very often there are projects where either for legal or convenience reasons, using such an approach isn't very feasible. Also, the set of people that might be debugging the project may be relatively small or contained.
By default, the pdb's for a project include pointers to the files on disk (IIRC) and then source indexing can add the ability to embed pointers to the source locations (for instance, in a version control system), with a source server then using the pointers to actually fetch the source.
It seems like things could be simpler (for certain builds, like debug and/or internal-only) to just put the actual source into the pdb (effectively just dereferencing the pointer currently written in the PDB). It seems like then you can skip the entire source server part (at least in theory) and eliminate a few dependencies on the debug-time story. Whether to store the source as compressed or not is largely orthogonal, but a first pass would probably not do so in an effort to make it simpler to implement for existing debuggers.
Since the PDB-matching-binary story is already very good, putting the source into the PDB would be even better than a source server pointer, since the pointer can break over time (source control system moves, or changes to a different system, or whatever), but the actual source sitting in the PDB is good 'forever'.
(this was added via edit after Tigran's comment asking what the benefits would be)
The 'baseline' scenario that this should be compared against is that of a 'normal' debugging experience using a 'normal' source server instance today. In that scenario, (AFAIK) the debugging engine gets a pointer from the PDB (via an alternate stream) then uses the registered source server(s) to attempt to get the source via that pointer. Since a given assembly is typically going to include multiple source files, there's either a single pointer that includes a base location or there are multiple pointers in the PDB (or something else), but that should be orthogonal to this discussion.
For a project where keeping the source hidden/inaccessible is desirable (most Microsoft products, for instance, including Windows, Office, Visual Studio, etc.), then having the PDB contain pointers is FAR superior to including actual source (even if it were encrypted). Such pointers are meaningless without the necessary network access and permissions, so such an approach means you can ship the PDB to anyone on the planet without worrying about them being able to access your source (worst-case, they get a glimpse into how your source tree is arranged, I would think).
However, there are 2 large sets of projects (and specifically, builds) where this 'hide the source' benefit doesn't exist.
The first are builds that are only used by people that have access to the source anyway. Builds done on your own machine that won't ever leave that machine are a great example, as an attacker would need to read files from your filesystem anyway to get the source, so reading from one file (.cs) vs. another (.pdb) is a relatively small difference in terms of attack difficulty/vector. Similarly, builds that are done and pushed to a test/staging environment where the people that access the pdb on machine are equal to or a subset of the people that can access the source 'normally'.
The second are (somewhat obviously) open-source projects, where the source for the project is already open for everyone anyway, so there's no benefit to hiding the source from anyone.
Note that this could be relatively easily extended to include the source in an encrypted form instead (since we're already talking about having to store format/encoding data as well), but the added complexity of that would make such a scenario likely less useful than just using a 'normal' source server.
With the above descriptions out of the way, the list of potential benefits to allowing this include (but are not limited to :) these that pop into my head at the moment:
NOTE: another approach would be including the source in the actual assembly (for instance, as a resource), but the pdb is a better choice (easy to ship a build without pdb's, no normal runtime perf hit if the source is in the pdb since the assembly is the same code and same size, etc)
On the surface of it, this kind of support doesn't seem like it would be too difficult to add, but I get the feeling this is because I don't really know enough about the mechanics involved instead of it actually being a simple thing to implement. :)
My guess would be something along the lines of:
With the babble above out of the way, the questions are really:
The easiest way to use the PDB file is to let Visual Studio do the heavy lifting - either launch your program with Visual Studio's "Debug" command (F5 by default), or run the program and use the "Attach to Process" item in Visual Studio's Debug menu.
In Visual Studio, open Tools > Options > Debugging > Symbols (or Debug > Options > Symbols).
pdb files contain the following information: The names of all local variables. The names of all source code files and the mapping from IL instructions onto lines within those files.
I've read over this and wanted to summarize my understanding for clarity
Today the debugger uses the PDB to gain the disk path to a file and checksum which was compiled to create a given section of an executable. The debugger then attempts to load the file using both the local disk and available symbol server. Under this proposal we would skip the middle man by just embedding the file itself into the PDB. Eureka, no more searching for source!
As someone who's done their fair share of digging for source code in this manner I like the idea of having one package for all your debugging needs. There are a couple of facets to consider about this proposal though.
The first is the actual embedding of the source code into the PDB. This is very doable. The PDB is essentially a light weight file database. There is structure to what it encodes but AFAIK you can put whatever you want into certain slots (local variable values / types for example). There may be size limitations for certain slots but I'm sure you could invent an encoding scheme to break large files up into chunks.
The second facet is having the debugger actually load the file from the PDB vs. searching for it on disk. I'm not as familiar with that part of the debugger but from what I understand it only uses 2 pieces of information to locate the file
I'm fairly certain this is the only information it passes onto a symbol server. This makes it unfeasible to implement a symbol server because it won't have access to the PDB (assuming of course I'm right).
I dug around hoping there was a VS COM component you could override which would allow you to intercept the loading of the file for a given path but I couldn't find one.
One approach I think would be feasible though would be
This wouldn't be quite what you want though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With