Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-alphanumeric characters in COM/.NET interface names

I'm thinking of using the characters #@! in some COM interfaces our system generates. The COM type library is also exported to .NET. Will those characters cause me trouble later on?

I've tested it out for most of the day today, and it all seems fine. Our system continues to work just like it always did.

The reason I'm cautious is that those characters are illegal in MIDL, which uses C syntax for type names. But we don't use MIDL - we build our type libraries with ICreateTypeInfo and ICreateTypeLib. Looks like that's just a MIDL restriction, and COM and .NET are happy with the non-alphanumeric characters. But maybe there's something I don't know...

like image 316
Ciaran Keating Avatar asked Nov 12 '10 05:11

Ciaran Keating


2 Answers

This is what I've found.

I think there's no question that the names are legal at the binary level in COM, since a COM interface’s name is its IID and the text name is just documentation.

On the .NET side, the relevant specification is the Common Language Infrastructure specification (ECMA-335, http://www.ecma-international.org/publications/standards/Ecma-335.htm.) I wonder whether .NET or Mono add their own restrictions on top – to do so would reduce interoperability, but this is the real world.

Section 8.5.1 covers valid type names in the Common Type System, and simply says that names are compared using code points. Odd that it says nothing about the composition of a name, only how names are compared. This section is paraphrased by MSDN at http://msdn.microsoft.com/en-us/library/exy17tbw%28v=VS.85%29.aspx, which says that the only two restrictions are (1) type names are "encoded as strings of Unicode (16-bit) characters", and (2) they can't contain an embedded 0x0000.

I've quoted the bit about 16-bit Unicode, rather than paraphrase it, because it uses imprecise language. Presumably the author of that page meant UTF-16. In any case, ECMA-335 specifies byte-by-byte comparison, and makes no mention of Unicode (regarding type names), and neither does it prohibit embedded zeros. Perhaps .NET has deviated from the CTS here, although I doubt it. More likely, the author of this MSDN page was thinking about programming languages when he wrote it.

The Common Language Specification (also defined in ECMA-335) defines the rules for identifiers in source code. Identifiers aren't directly relevant to my question because my internal type names never appear in source code, but I looked into it anway. The CLS is a subset of the CTS, and as such its restrictions aren’t necessarily part of the broader CTS. CLS Rule 4 says that identifiers must follow the rules of Annex 7 of Technical Report 15 of the Unicode Standard 3.0 - see http://www.unicode.org/reports/tr15/tr15-18.html. That document too is a little vague, in that it refers to "other letter" and "connector punctuations" but doesn't define them. This helped: http://notes.jschutz.net/topics/unicode/.

Section 8.5.1 of the ECMA spec includes a non-normative note that a CLS consumer (such as C# or the Visual Studio type browser, I suppose) “need not consume types that violate CLS Rule 4.” My proposed interface names do violate this Rule 4. This note seems to imply that a valid type may have a name that violates rule 4, and that a CLS consumer should either accept the rogue name or safely ignore it. (The Visual Studio type browser displays it without complaint.)

So my proposed type names are generally illegal in source code. But note that section 10.1 (about identifiers in the CLS) says “Since its rules apply only to items exported to other languages, private members or types that aren’t exported from an assembly can use any names they choose.”

I conclude that it's safe to use the characters #@! in my type names as long as they remain in the binary domain and never need appear in source code nor outside the assembly. And in fact they're never used outside the COM server.

A word about future-proofing... The CTS pretty much has nothing to say about the composition of type names, despite having a section called “Valid names” (section 8.5.1). They might change that in the future, but this broad and liberal specification has invited us all to do what we like. If the CTS designers had wanted to leave room for change then surely they would have built in some provision for that, or at least been less generous.

like image 104
Ciaran Keating Avatar answered Sep 29 '22 01:09

Ciaran Keating


It's interesting that you seem to have found a loophole in COM type naming. Microsoft restricts the use of characters '#@!' as identifiers in MIDL, but they don't duplicate that restriction in the ICreateTypeInfo and ICreateTypeLib interfaces.

Using these characters works today, so what's the risk?

  1. Well, Microsoft could see this as a bug and 'fix' ICreateTypeInfo, ICreateTypeLib, .Net COM Interop, and/or .Net type naming restrictions in the next release.

  2. You're creating and using an interface that doesn't have any valid MIDL definition.

  3. You're using names that will probably have to change if (when) you transition from COM to .Net. Even if just you want to create an adapter type in .Net you will not be able to reuse any of the "invalid" names.

  4. Is this compatible with Mono and other non-Microsoft .Net compatible technologies?

  5. There are plenty of known valid names that could be used (use something like '_at_' instead of '@', etc.) to avoid any possible future issue.

If none of this matters to you, then you'll probably be fine. But I suspect by the very fact that you asked this question, at some level it doesn't 'feel' right to you.

Good luck.

like image 35
jimhark Avatar answered Sep 28 '22 23:09

jimhark